The SenseCAP Watcher is an AI-powered assistant from Seeed Studio. It utilizes the ESP32-S3 microcontroller to provide local vision and voice processing.
The Watcher features a 1.2-inch circular touch screen, an integrated camera, a microphone and a built-in speaker. It provides Wi-Fi and Bluetooth connectivity and is expandable through a Grove connector, an expansion header with a serial connection and a microSD card slot.

In this tutorial you will learn how to connect the SenseCAP Watcher to an ESP32 to receive detection results via the serial interface and react to it. We will build a Desk Inactivity Monitor that encourages you to exercise, if your at your desk for more than 60 minutes.
Required Parts
You will need a SenseCAP Watcher, which you can get at Seeed Studio. It comes with a USB-C cable, a stand and a 1/4″ female adapter. You will also need an ESP32. I picked a XIAO ESP32-C5 but any other ESP32 will work as well. Finally, you will need an active buzzer for the Desk Inactivity Monitor project, and some wires and a breadboard to connect everything.

SenseCAP Watcher

XIAO ESP32-C5

USB C Cable

Active Buzzer 5V

Dupont Wire Set

Breadboard
Makerguides is a participant in affiliate advertising programs designed to provide a means for sites to earn advertising fees by linking to Amazon, AliExpress, Elecrow, and other sites. As an Affiliate we may earn from qualifying purchases.
What is SenseCAP Watcher W1-B?
The SenseCAP Watcher W1-B is a compact, self-contained AI sensing node that combines computer vision, audio interaction, and task automation into a single device.
At the hardware level, the device integrates an ESP32-S3 microcontroller together with a dedicated AI accelerator, a wide-angle camera, microphone, speaker, touchscreen, and wireless connectivity via Wi-Fi and Bluetooth.
This combination allows the Watcher to perform real-time, on-device inference such as object detection or event recognition without relying entirely on the cloud, which improves latency and privacy.

A key concept behind the Watcher is its hybrid AI architecture. Vision and basic logic can run locally on the device, while more advanced processing can be offloaded to external services or large language models via the SenseCraft platform.
Task-based operation and user interaction
Instead of traditional firmware logic, the Watcher operates using a task-flow model. Internally, functions are organized into modular “blocks” that can produce or consume data, similar to how Node-RED works. These blocks are connected to define behavior, for example: detect a person → analyze condition → send notification → trigger external system.
Interaction with the device is multimodal. The built-in camera enables vision-based triggers, while the microphone and speaker support voice commands through a push-to-talk interface. The touchscreen and rotary input provide local control and feedback.
IoT Integration
The Watcher is designed to act as a bridge between AI perception and existing IoT infrastructure. It exposes its data and events through multiple interfaces, including HTTP, UART, and message-based integrations, which allows it to connect to frameworks like Node-RED and Home Assistant.
In a Node-RED setup, the Watcher typically functions as an intelligent event source. Detected events, such as “person detected” or “object missing,” can be sent via HTTP or MQTT into a Node-RED flow, where they can be processed further.
When integrated with Home Assistant, the Watcher becomes a high-level sensor that augments traditional binary sensors. Instead of just reporting motion, it can provide semantic information like identifying specific objects or situations. This allows more advanced automations, such as triggering different actions depending on who enters a room or what activity is detected.
For maker projects, the Watcher can be used as a high-level perception module connected to microcontrollers such as ESP32 boards. For example, it can detect persons or gestures and send structured data via UART or HTTP to trigger LEDs, buzzers, motors, or other hardware.
Typical application scenarios
The flexibility of combining on-device AI with external orchestration enables a wide range of applications. In a smart home environment, the Watcher can detect presence and context, for example recognizing when a person enters a room and automatically adjusting lighting or displaying relevant information. It can also monitor pets or detect unusual situations such as a fall.
In security scenarios, the device can act as an intelligent surveillance node that distinguishes between normal activity and anomalies, reducing false alarms compared to traditional motion sensors. Because processing can happen locally, sensitive image data does not need to leave the device.
In this tutorial we will use the Watcher to build a Desk Inactivity Monitor. The Watcher will monitor the presence of a person sitting at a desk. The person detection information will be sent via UART to a connected ESP32, which keeps track of time. If the person does not leave the desk for more than 60 minutes the ESP32 sounds a buzzer as a reminder to exercise.
Hardware of the SenseCAP Watcher
The SenseCAP Watcher is powered by a ESP32-S3 microcontroller running at 240MHz. This chip provides dual-core processing and native support for Wi-Fi and Bluetooth. It includes 8MB of dedicated PSRAM to support memory-intensive applications. The system also features 32MB of flash storage for user firmware and data.

A separate Himax HX6538 AI processor handles advanced vision and vector calculations. This secondary processor includes an additional 16MB of flash memory for AI models.
Visual and Audio Capabilities
The front of the device houses a 1.45-inch circular touchscreen with a resolution of 412×412 pixels. An OV5647 camera sensor provides a 120-degree wide-angle field of view. The camera is set to a fixed focus at a distance of three meters.
Audio input is captured through a single integrated microphone on the board. A built-in 1W speaker provides audio feedback and voice responses. The picture below shows the front and back of the SenseCAP Watcher:

Interaction and Indicator Tools
Users can navigate the internal software using a digital crown wheel located on the side. This wheel supports both scrolling and a button-press function for making selections. A single RGB LED provides status indications such as power or connectivity states. A dedicated reset button is accessible through a small hole at the bottom of the case. The device also includes a microSD card slot for expandable storage up to 32GB.
Connectivity and Power
Wireless communication is handled via 2.4GHz Wi-Fi and Bluetooth 5.0. The unit features two USB-C ports for different mounting and power scenarios. The bottom port supports both 5V power and serial programming for development. The back port is reserved for 5V power delivery only. A 400mAh lithium-ion battery serves as a backup power source for short-term use.
For external hardware, the Watcher includes a Grove I2C port and a 2×4 female header for GPIO expansion. The picture below shows the different connectors on the back of the SenseCAP Watcher:

Note that the 5V pin is an input pin, while the 3V3 pin is an output pin. Don’t connect the 5V input pin and at the same time provide power to the Watcher via the USB port.
Technical Specification
The following table summarizes the technical specification of the SenseCAP Watcher:
| Hardware | Description |
|---|---|
| MCU | ESP32-S3 @240MHz 8MB PSRAM |
| Built-in AI Processor | Himax HX6538 (Cortex M55 + Ethos-U55) |
| Camera | OV5647 120° FOV Fixed Focal 3 meters |
| Wi-Fi | IEEE 802.11b/g/n-compliant 2.4GHz Band Wireless Range: Up to 100 meters (open space test) |
| Bluetooth LE | Bluetooth 5 |
| Antenna | Built-in Wi-Fi and BLE antenna |
| Display | Touchscreen with 1.45-inch, 412×412 resolution |
| Microphone | Single microphone |
| Speaker | 1W speaker output |
| Wheel | Supports scrolling up&down and button |
| LED | 1xRGB light for indication |
| microSD Card Slot | Supports up to 32GB FAT32 microSD card |
| Flash | 32MB Flash for ESP32-S3 16MB Flash for Himax HX6538 |
| Extension Interface | 1xGrove IIC interface 2×4 Female header(1xIIC, 2xGPIO, 2xGND, 1×3.3V_OUT, 1x5V_IN) |
| USB-C | 1x USB-C on the back(power supply only) 1x USB-C on the bottom(power supply and programming) |
| Reset Button | 1xRST button in the bottom hole |
| Power Supply | 5V DC power |
| Battery | 3.7V 400mAh Li-ion battery as backup power |
| Operating Temperature | 0 ~ 45°C |
Connecting the SenseCAP Watcher to an ESP32
The SenseCAP Watcher offers various methods to transmit detection information to other systems for processing such as Node-RED or Home Assistant. However, this requires that you run a Node-Red or Home Assistant server.
For a small, local detection system a better option is to connect the SenseCAP Watcher to another microcontroller that evaluates detection results and then performs actions, e.g. sounding an alarm. This can be achieved by using the serial interface (UART) of the Watcher.
On the back of the Watcher you will find a 8 pin connector with I2C (SCL, SDA), UART (RX, TX) and power interfaces. The wiring diagram below shows you how to connect the SenseCAP Watcher to an XIAO ESP32-C5 board via UART:

Start by connecting TX of the Watcher to pin D7 (RX) of the XIAO ESP32-C5. Next connect the RX of the Watcher to pin D6 (TX) of the ESP32-C5. We will power the Watcher from the ESP32-C5 by connecting the 5V and the GND pins. The following table shows you the connections you need to make:
| Watcher | ESP32-C5 |
|---|---|
| RX | D6/TX |
| TX | D7/RX |
| 5V | 5V |
| GND | GND |
Make sure to use the USB-C port on the ESP32-C5 to power the circuit. This will provide power for the ESP32-C5 and the Watcher together. Do not connect the USB-C port of the Watcher.
Creating a Task with UART notification
Sending of detection results from the Watcher via UART to a connected ESP32 needs to be enabled on a per task basis in the SenseCraft APP. In this section, you will learn how to do this but you need to have the SenseCraft APP installed on your phone and connected to your SenseCAP Watcher. If not, then read the Quick Start Guide first and follow the instructions there.
Creating a detection task in the SenseCraft APP is easy. For instance, type “Notify via uart if person detected” to create a person detection task:

Once the task is created, click on the “Detail Configs” button to open the Manual Configuration dialog. There you will find multiple check boxes. Make sure that “Serial Port / UART Output” is checked:

Most of the other check boxes will be checked by default but for using the serial communication we won’t need them. For more detailed information see the UART Output section of Seeed Studio’s documentation for the SenseCAP Watcher.
Finally, press the “Run Task” button at the bottom of the dialog to start the task.
Code Example: Read detection results
Once the task is running, and the Watcher is connected via UART to the ESP32-C5, we can test the transmission of detection results. Upload the following code to your ESP32-C5 that needs to be connected to your PC via the USB-C cable. Do not connect the Watcher via USB!
#include <ArduinoJson.h>
DynamicJsonDocument doc(1024 * 100); // 100K
void setup() {
Serial.begin(115200);
Serial1.begin(115200, SERIAL_8N1, D7, D6); // RX, TX
while (!Serial)
;
delay(100);
Serial.println("Ready.");
}
void loop() {
if (Serial1.available()) {
deserializeJson(doc, Serial1);
if (doc.containsKey("inference")) {
Serial.println(doc["inference"].as<String>());
}
}
}
The above code receives the JSON-formatted detection data sent by the Watcher over the serial connection and prints the information to the Serial Monitor.
Imports
The code begins by including the ArduinoJson library, which is essential for parsing JSON data received from the Watcher. This library simplifies handling structured data in JSON format.
#include <ArduinoJson.h>
JSON Document Object
Next, a DynamicJsonDocument named doc is declared with a capacity of 100 kilobytes (1024 * 100 bytes). This object will hold the parsed JSON data received from the camera. The size is chosen to accommodate the maximum size of JSON messages sent from the Watcher.
DynamicJsonDocument doc(1024 * 100); // 100K
Setup Function
In the setup() function, two serial interfaces are initialized. The first, Serial, is started at 115200 baud for communication with the Serial Monitor of the Arduino IDE. The second, Serial1, is also started at 115200 baud but configured with specific pins D7 (RX) and D6 (TX) to communicate with the SenseCAP Watcher camera.
The code waits until the USB serial connection is established before proceeding, ensuring that debug messages can be seen immediately. After a short delay of 100 milliseconds, it prints “Ready.” to indicate that the ESP32 is prepared to receive data.
void setup() {
Serial.begin(115200);
Serial1.begin(115200, SERIAL_8N1, D7, D6); // RX, TX
while (!Serial)
;
delay(100);
Serial.println("Ready.");
}
Loop Function
The loop() function continuously checks if there is any data available on Serial1, which is connected to the Watcher. When data is detected, it attempts to deserialize the incoming JSON stream into the doc object.
If the parsed JSON contains the key "inference", the corresponding value is extracted as a string and printed to the Serial Monitor.
void loop() {
if (Serial1.available()) {
deserializeJson(doc, Serial1);
if (doc.containsKey("inference")) {
Serial.println(doc["inference"].as<String>());
}
}
}
Output Example
When you run this code on the ESP32-C5 you should see detection results similar to the following ones appearing on the Serial Monitor:

If not, check the wiring and make sure that “Serial port / UART output” is enabled for the detection task.
The data sent will depend on the detection model. For the person detection task you always receive the bounding boxes for the detected persons, the confidence value, the class id and the list of class names, in this case only [“person”].
If you would run the Gesture detection model that detects Rock, Paper or Scissors gestures the data sent by the Watcher look like this, for instance:
{"boxes":[[176,208,144,218,83,0]],"classes_name":["Paper","Rock","Scissors"]}
Code Example: Extracting detection data
In the previous code example we interpreted and printed the detection data transmitted by the Watcher as a string. However, in many cases you want to extract specific information as numerical data, e.g. the width and the height of the bounding box.
The following code examples shows you how to do this. It extracts the data from the JSON document and prints them as numerical values:
#include <ArduinoJson.h>
DynamicJsonDocument doc(1024 * 100); // 100K
void setup() {
Serial.begin(115200);
Serial1.begin(115200, SERIAL_8N1, D7, D6); // RX, TX
while (!Serial)
;
delay(100);
Serial.println("Ready.");
}
void loop() {
if (Serial1.available()) {
deserializeJson(doc, Serial1);
if (doc.containsKey("inference")) {
JsonArray b = doc["inference"]["boxes"][0].as<JsonArray>();
Serial.printf("x:%d y:%d w:%d h:%d | score:%d cls_id:%d\n",
b[0].as<int>(), b[1].as<int>(), b[2].as<int>(), b[3].as<int>(),
b[4].as<int>(), b[5].as<int>());
}
}
}
The code is identical to the previous example apart from the printing of the detection data. Instead of printing as a string (doc["inference"].as()) it extracts the first bounding box from the "boxes" array within the inference data.
This bounding box is a JSON array containing six integers representing the detected object’s coordinates and metadata: x, y, width, height, score (confidence), and cls_id (class ID). These values are printed in a formatted string. Here is an output example.
x:131 y:290 w:240 h:208 | score:71 cls_id:0
Once you have the detection results as numerical data you can perform calculations. For instance, you could approximate the distance of a person to the Watcher by computing
distance = c * b[2].as<int>() * b[3].as<int>();
where b[2] and b[3] contain the width and height of the bounding box, and c is constant to convert the measurement into some distance unit.
Code Example: Desk Inactivity Monitor
In the final code example we will build a Desk Inactivity Monitor. The SenseCAP Watcher will be placed on a desk and will continuously run the person detection task. The detection data is periodically sent to the connected ESP32-C5, which runs a timer. If the person is at their desk for more than 60 minutes without an interruption the ESP32-C5 sounds a buzzer to encourage the person to take a break.
Since the ESP32-C5 has no built-in buzzer, we need to connect an external one. This is easy. Just connect the negative pole of the buzzer to GND and the positive pole to the D0 pin as shown below.

The other connections remain as before. The table below lists all the connections you need to make.
| Watcher | ESP32-C5 | Buzzer |
|---|---|---|
| RX | D6/TX | |
| TX | D7/RX | |
| 5V | 5V | |
| GND | GND | GND |
| D0 | + |
If you need more help with buzzer, have a look at our Active and Passive Piezo Buzzers with Arduino tutorial.
Ensure that you connect the buzzer in the correct polarity and that it is an active buzzer – otherwise the following code for the Desk Inactivity Monitor won’t work.
#include <ArduinoJson.h>
DynamicJsonDocument doc(1024 * 100); // 100K
const int BUZZER_PIN = D0; // GPIO 1
const unsigned long MAX_SIT_TIME_MS = 1000 * 60 * 60;
unsigned long tPerson = 0;
unsigned long tSitting = 0;
bool personDetected = false;
void setup() {
Serial.begin(115200);
Serial1.begin(115200, SERIAL_8N1, D7, D6); // RX, TX
pinMode(BUZZER_PIN, OUTPUT);
digitalWrite(BUZZER_PIN, LOW);
while (!Serial)
;
delay(100);
tPerson = millis();
tSitting = millis();
Serial.println("running...");
}
void loop() {
unsigned long tCurrent = millis();
if (Serial1.available()) {
deserializeJson(doc, Serial1);
if (doc.containsKey("inference")) {
Serial.printf("person sitting %d sec\n", (tCurrent - tSitting)/1000);
tPerson = tCurrent;
if (!personDetected)
tSitting = tCurrent;
personDetected = true;
}
}
if ((tCurrent - tPerson) > 1000 * 10) {
personDetected = false;
tSitting = tCurrent;
}
if (personDetected && ((tCurrent - tSitting) >= MAX_SIT_TIME_MS)) {
tSitting = tCurrent;
digitalWrite(BUZZER_PIN, HIGH);
Serial.println("Take a break now!");
delay(1000);
digitalWrite(BUZZER_PIN, LOW);
}
}
Imports
As before, we start by including the ArduinoJson library, which is used to parse JSON data received from the Watcher.
#include <ArduinoJson.h>
JSON Document
Then a DynamicJsonDocument named doc is created with a capacity of 100 kilobytes. This document will hold the parsed JSON data received from the Watcher.
DynamicJsonDocument doc(1024 * 100); // 100K
Constants
The constant BUZZER_PIN defines the GPIO pin connected to the buzzer, here set to D0 (which corresponds to GPIO 1 on the ESP32-C5). Another constant, MAX_SIT_TIME_MS, defines the maximum allowed sitting time in milliseconds, set to 60 minutes (1000 ms × 60 s × 60 min).
const int BUZZER_PIN = D0; // GPIO 1 const unsigned long MAX_SIT_TIME_MS = 1000 * 60 * 60;
Variables
We use several variables to keep track of timing and detection state. tPerson stores the last time a person was detected, tSitting records when the sitting period started, and personDetected is a boolean flag indicating whether a person is currently detected.
unsigned long tPerson = 0; unsigned long tSitting = 0; bool personDetected = false;
Setup Function
The setup() function initializes serial communication for debugging (Serial) and communication with the AI Vision camera (Serial1) at 115200 baud. The buzzer pin is configured as an output and initially turned off. The code waits for the serial port to be ready, then initializes the timing variables tPerson and tSitting with the current time in milliseconds. Finally, it prints “running…” to the serial monitor to indicate the program has started.
void setup() {
Serial.begin(115200);
Serial1.begin(115200, SERIAL_8N1, D7, D6); // RX, TX
pinMode(BUZZER_PIN, OUTPUT);
digitalWrite(BUZZER_PIN, LOW);
while (!Serial)
;
delay(100);
tPerson = millis();
tSitting = millis();
Serial.println("running...");
}
Loop Function
The loop() function runs repeatedly and performs the core monitoring logic. It first reads the current time in milliseconds into tCurrent.
If data is available on Serial1 (from the AI Vision camera), it attempts to parse the incoming JSON into doc. If the JSON contains the key "inference", it means a person has been detected. The code then prints how many seconds the person has been there, updates tPerson to the current time, and if this is the first detection after a period of no detection, it resets tSitting to start timing the sitting duration. The personDetected flag is set to true.
If more than 10 seconds have passed since the last person detection (tCurrent - tPerson > 10000), the code assumes the person has left, sets personDetected to false, and resets tSitting to the current time.
Finally, if a person is detected and the sitting time has reached or exceeded the maximum allowed time (MAX_SIT_TIME_MS), the buzzer is activated for 1 second (1000 ms) to remind the user to take a break. The sitting timer tSitting is reset after the buzzer sounds to start a new monitoring period.
void loop() {
unsigned long tCurrent = millis();
if (Serial1.available()) {
deserializeJson(doc, Serial1);
if (doc.containsKey("inference")) {
Serial.printf("person sitting %d sec\n", (tCurrent - tSitting)/1000);
tPerson = tCurrent;
if (!personDetected)
tSitting = tCurrent;
personDetected = true;
}
}
if ((tCurrent - tPerson) > 1000 * 10) {
personDetected = false;
tSitting = tCurrent;
}
if (personDetected && ((tCurrent - tSitting) >= MAX_SIT_TIME_MS)) {
tSitting = tCurrent;
digitalWrite(BUZZER_PIN, HIGH);
Serial.println("Take a break now!");
delay(1000);
digitalWrite(BUZZER_PIN, LOW);
}
}
You can test the code by reducing the MAX_SIT_TIME_MS to one minute (1000 * 60). If you sit on your desk for more than 60 seconds the buzzer should sound. If you leave your desk before, the timer will reset.
Note that the person detection model just detects the presence of a person but not whether the person is sitting or not. However, you could use a skeletal model to detect specific poses and count only the time a person is actually sitting, for instance.
Conclusions
The SenseCAP Watcher W1-B combines an ESP32-S3 platform with integrated vision, audio, and AI capabilities. It enables you to build intelligent sensors at the edge. The task-based architecture and flexible connectivity make it straightforward to integrate with established tools such as Node-RED and Home Assistant.
For makers working with Arduino or ESP32 ecosystems, the Watcher can act as a high-level AI co-processor that offloads the compute-intensive AI sensing tasks. The application specific and potentially complex evaluation logic can be implemented on a separate microcontroller that communicates with the Watcher via the serial interface. Note, however, that the SenseCAP Watcher operates with 3.3 Volt logic and a level shifter is required to connect it to 5V Arduino UNO, for instance.
As an AI vision sensor the SenseCAP Watcher is similar to the HUSKYLENS or the HUSKYLENS 2 devices. However, the Watcher is a bit more like a self-contained AI agent that can interpret scenes and act autonomously. Whereas HUSKLENS and HUSKLENS 2 are vision sensors designed to offload the interpretation and reaction to detected objects to a microcontroller via UART/I2C interfaces.
If you have any questions feel free to leave them in the comment section.
Happy Tinkering 😉
Links
Here a list of links that I found useful when writing this tutorial:
- SenseCAP Watcher Wiki: The primary source for hardware overviews and operation guidelines.
- SenseCraft AI Workspace: The web-based portal for no-code model deployment and firmware updates.
- Seeed Studio Forum: The official community space for troubleshooting and sharing Watcher projects.
- Watcher GitHub Repository: Access open-source hardware schematics and core firmware files.
- Home Assistant Integration Guide: Step-by-step instructions for connecting the Watcher to your local automation server.
- Node-RED Integration: Learn how to push AI event data to custom dashboards and external APIs.
Stefan is a professional software developer and researcher. He has worked in robotics, bioinformatics, image/audio processing and education at Siemens, IBM and Google. He specializes in AI and machine learning and has a keen interest in DIY projects involving Arduino and 3D printing.

