Skip to Content

Interfacing SenseCAP Watcher W1-B with ESP32

Interfacing SenseCAP Watcher W1-B with ESP32

The SenseCAP Watcher is an AI-powered assistant from Seeed Studio. It utilizes the ESP32-S3 microcontroller to provide local vision and voice processing.

The Watcher features a 1.2-inch circular touch screen, an integrated camera, a microphone and a built-in speaker. It provides Wi-Fi and Bluetooth connectivity and is expandable through a Grove connector, an expansion header with a serial connection and a microSD card slot.

SenseCAP Watcher (source)

In this tutorial you will learn how to connect the SenseCAP Watcher to an ESP32 to receive detection results via the serial interface and react to it. We will build a Desk Inactivity Monitor that encourages you to exercise, if your at your desk for more than 60 minutes.

Required Parts

You will need a SenseCAP Watcher, which you can get at Seeed Studio. It comes with a USB-C cable, a stand and a 1/4″ female adapter. You will also need an ESP32. I picked a XIAO ESP32-C5 but any other ESP32 will work as well. Finally, you will need an active buzzer for the Desk Inactivity Monitor project, and some wires and a breadboard to connect everything.

SenseCAP Watcher

XIAO ESP32-C5

USB C Cable

Active Buzzer 5V

Dupont wire set

Dupont Wire Set

Half_breadboard56a

Breadboard

Makerguides is a participant in affiliate advertising programs designed to provide a means for sites to earn advertising fees by linking to Amazon, AliExpress, Elecrow, and other sites. As an Affiliate we may earn from qualifying purchases.

What is SenseCAP Watcher W1-B?

The SenseCAP Watcher W1-B is a compact, self-contained AI sensing node that combines computer vision, audio interaction, and task automation into a single device.

At the hardware level, the device integrates an ESP32-S3 microcontroller together with a dedicated AI accelerator, a wide-angle camera, microphone, speaker, touchscreen, and wireless connectivity via Wi-Fi and Bluetooth.

This combination allows the Watcher to perform real-time, on-device inference such as object detection or event recognition without relying entirely on the cloud, which improves latency and privacy.

SenseCraft platform
SenseCraft platform (source)

A key concept behind the Watcher is its hybrid AI architecture. Vision and basic logic can run locally on the device, while more advanced processing can be offloaded to external services or large language models via the SenseCraft platform.

Task-based operation and user interaction

Instead of traditional firmware logic, the Watcher operates using a task-flow model. Internally, functions are organized into modular “blocks” that can produce or consume data, similar to how Node-RED works. These blocks are connected to define behavior, for example: detect a person → analyze condition → send notification → trigger external system.

Interaction with the device is multimodal. The built-in camera enables vision-based triggers, while the microphone and speaker support voice commands through a push-to-talk interface. The touchscreen and rotary input provide local control and feedback.

IoT Integration

The Watcher is designed to act as a bridge between AI perception and existing IoT infrastructure. It exposes its data and events through multiple interfaces, including HTTP, UART, and message-based integrations, which allows it to connect to frameworks like Node-RED and Home Assistant.

In a Node-RED setup, the Watcher typically functions as an intelligent event source. Detected events, such as “person detected” or “object missing,” can be sent via HTTP or MQTT into a Node-RED flow, where they can be processed further.

When integrated with Home Assistant, the Watcher becomes a high-level sensor that augments traditional binary sensors. Instead of just reporting motion, it can provide semantic information like identifying specific objects or situations. This allows more advanced automations, such as triggering different actions depending on who enters a room or what activity is detected.

For maker projects, the Watcher can be used as a high-level perception module connected to microcontrollers such as ESP32 boards. For example, it can detect persons or gestures and send structured data via UART or HTTP to trigger LEDs, buzzers, motors, or other hardware.

Typical application scenarios

The flexibility of combining on-device AI with external orchestration enables a wide range of applications. In a smart home environment, the Watcher can detect presence and context, for example recognizing when a person enters a room and automatically adjusting lighting or displaying relevant information. It can also monitor pets or detect unusual situations such as a fall.

In security scenarios, the device can act as an intelligent surveillance node that distinguishes between normal activity and anomalies, reducing false alarms compared to traditional motion sensors. Because processing can happen locally, sensitive image data does not need to leave the device.

In this tutorial we will use the Watcher to build a Desk Inactivity Monitor. The Watcher will monitor the presence of a person sitting at a desk. The person detection information will be sent via UART to a connected ESP32, which keeps track of time. If the person does not leave the desk for more than 60 minutes the ESP32 sounds a buzzer as a reminder to exercise.

Hardware of the SenseCAP Watcher

The SenseCAP Watcher is powered by a ESP32-S3 microcontroller running at 240MHz. This chip provides dual-core processing and native support for Wi-Fi and Bluetooth. It includes 8MB of dedicated PSRAM to support memory-intensive applications. The system also features 32MB of flash storage for user firmware and data.

Architecture of the SenseCAP Watcher (source)

A separate Himax HX6538 AI processor handles advanced vision and vector calculations. This secondary processor includes an additional 16MB of flash memory for AI models.

Visual and Audio Capabilities

The front of the device houses a 1.45-inch circular touchscreen with a resolution of 412×412 pixels. An OV5647 camera sensor provides a 120-degree wide-angle field of view. The camera is set to a fixed focus at a distance of three meters.

Audio input is captured through a single integrated microphone on the board. A built-in 1W speaker provides audio feedback and voice responses. The picture below shows the front and back of the SenseCAP Watcher:

Front and Back of the SenseCAP Watcher
Front and Back of the SenseCAP Watcher (source)

Interaction and Indicator Tools

Users can navigate the internal software using a digital crown wheel located on the side. This wheel supports both scrolling and a button-press function for making selections. A single RGB LED provides status indications such as power or connectivity states. A dedicated reset button is accessible through a small hole at the bottom of the case. The device also includes a microSD card slot for expandable storage up to 32GB.

Connectivity and Power

Wireless communication is handled via 2.4GHz Wi-Fi and Bluetooth 5.0. The unit features two USB-C ports for different mounting and power scenarios. The bottom port supports both 5V power and serial programming for development. The back port is reserved for 5V power delivery only. A 400mAh lithium-ion battery serves as a backup power source for short-term use.

For external hardware, the Watcher includes a Grove I2C port and a 2×4 female header for GPIO expansion. The picture below shows the different connectors on the back of the SenseCAP Watcher:

Interfaces of the SenseCAP Watcher (source)

Note that the 5V pin is an input pin, while the 3V3 pin is an output pin. Don’t connect the 5V input pin and at the same time provide power to the Watcher via the USB port.

Technical Specification

The following table summarizes the technical specification of the SenseCAP Watcher:

HardwareDescription
MCUESP32-S3 @240MHz 8MB PSRAM
Built-in AI ProcessorHimax HX6538 (Cortex M55 + Ethos-U55)
CameraOV5647 120° FOV
Fixed Focal 3 meters
Wi-FiIEEE 802.11b/g/n-compliant
2.4GHz Band
Wireless Range: Up to 100 meters (open space test)
Bluetooth LEBluetooth 5
AntennaBuilt-in Wi-Fi and BLE antenna
DisplayTouchscreen with 1.45-inch, 412×412 resolution
MicrophoneSingle microphone
Speaker1W speaker output
WheelSupports scrolling up&down and button
LED1xRGB light for indication
microSD Card SlotSupports up to 32GB FAT32 microSD card
Flash32MB Flash for ESP32-S3
16MB Flash for Himax HX6538
Extension Interface1xGrove IIC interface
2×4 Female header(1xIIC, 2xGPIO, 2xGND, 1×3.3V_OUT, 1x5V_IN)
USB-C1x USB-C on the back(power supply only)
1x USB-C on the bottom(power supply and programming)
Reset Button1xRST button in the bottom hole
Power Supply5V DC power
Battery3.7V 400mAh Li-ion battery as backup power
Operating Temperature0 ~ 45°C

Connecting the SenseCAP Watcher to an ESP32

The SenseCAP Watcher offers various methods to transmit detection information to other systems for processing such as Node-RED or Home Assistant. However, this requires that you run a Node-Red or Home Assistant server.

For a small, local detection system a better option is to connect the SenseCAP Watcher to another microcontroller that evaluates detection results and then performs actions, e.g. sounding an alarm. This can be achieved by using the serial interface (UART) of the Watcher.

On the back of the Watcher you will find a 8 pin connector with I2C (SCL, SDA), UART (RX, TX) and power interfaces. The wiring diagram below shows you how to connect the SenseCAP Watcher to an XIAO ESP32-C5 board via UART:

Connecting SenseCAP Watcher to an ESP32
Connecting SenseCAP Watcher to an XIAO ESP32-C5

Start by connecting TX of the Watcher to pin D7 (RX) of the XIAO ESP32-C5. Next connect the RX of the Watcher to pin D6 (TX) of the ESP32-C5. We will power the Watcher from the ESP32-C5 by connecting the 5V and the GND pins. The following table shows you the connections you need to make:

WatcherESP32-C5
RXD6/TX
TXD7/RX
5V5V
GNDGND

Make sure to use the USB-C port on the ESP32-C5 to power the circuit. This will provide power for the ESP32-C5 and the Watcher together. Do not connect the USB-C port of the Watcher.

Creating a Task with UART notification

Sending of detection results from the Watcher via UART to a connected ESP32 needs to be enabled on a per task basis in the SenseCraft APP. In this section, you will learn how to do this but you need to have the SenseCraft APP installed on your phone and connected to your SenseCAP Watcher. If not, then read the Quick Start Guide first and follow the instructions there.

Creating a detection task in the SenseCraft APP is easy. For instance, type “Notify via uart if person detected” to create a person detection task:

Create Task with UART notification
Create Task with UART notification

Once the task is created, click on the “Detail Configs” button to open the Manual Configuration dialog. There you will find multiple check boxes. Make sure that “Serial Port / UART Output” is checked:

Configure Task with UART notification

Most of the other check boxes will be checked by default but for using the serial communication we won’t need them. For more detailed information see the UART Output section of Seeed Studio’s documentation for the SenseCAP Watcher.

Finally, press the “Run Task” button at the bottom of the dialog to start the task.

Code Example: Read detection results

Once the task is running, and the Watcher is connected via UART to the ESP32-C5, we can test the transmission of detection results. Upload the following code to your ESP32-C5 that needs to be connected to your PC via the USB-C cable. Do not connect the Watcher via USB!

#include <ArduinoJson.h>

DynamicJsonDocument doc(1024 * 100); // 100K

void setup() {
  Serial.begin(115200);
  Serial1.begin(115200, SERIAL_8N1, D7, D6);  // RX, TX
  while (!Serial)
    ;
  delay(100);
  Serial.println("Ready.");
}

void loop() {
  if (Serial1.available()) {
    deserializeJson(doc, Serial1);
    if (doc.containsKey("inference")) {
      Serial.println(doc["inference"].as<String>());
    }
  }
}

The above code receives the JSON-formatted detection data sent by the Watcher over the serial connection and prints the information to the Serial Monitor.

Imports

The code begins by including the ArduinoJson library, which is essential for parsing JSON data received from the Watcher. This library simplifies handling structured data in JSON format.

#include <ArduinoJson.h>

JSON Document Object

Next, a DynamicJsonDocument named doc is declared with a capacity of 100 kilobytes (1024 * 100 bytes). This object will hold the parsed JSON data received from the camera. The size is chosen to accommodate the maximum size of JSON messages sent from the Watcher.

DynamicJsonDocument doc(1024 * 100); // 100K

Setup Function

In the setup() function, two serial interfaces are initialized. The first, Serial, is started at 115200 baud for communication with the Serial Monitor of the Arduino IDE. The second, Serial1, is also started at 115200 baud but configured with specific pins D7 (RX) and D6 (TX) to communicate with the SenseCAP Watcher camera.

The code waits until the USB serial connection is established before proceeding, ensuring that debug messages can be seen immediately. After a short delay of 100 milliseconds, it prints “Ready.” to indicate that the ESP32 is prepared to receive data.

void setup() {
  Serial.begin(115200);
  Serial1.begin(115200, SERIAL_8N1, D7, D6);  // RX, TX
  while (!Serial)
    ;
  delay(100);
  Serial.println("Ready.");
}

Loop Function

The loop() function continuously checks if there is any data available on Serial1, which is connected to the Watcher. When data is detected, it attempts to deserialize the incoming JSON stream into the doc object.

If the parsed JSON contains the key "inference", the corresponding value is extracted as a string and printed to the Serial Monitor.

void loop() {
  if (Serial1.available()) {
    deserializeJson(doc, Serial1);
    if (doc.containsKey("inference")) {
      Serial.println(doc["inference"].as<String>());
    }
  }
}

Output Example

When you run this code on the ESP32-C5 you should see detection results similar to the following ones appearing on the Serial Monitor:

If not, check the wiring and make sure that “Serial port / UART output” is enabled for the detection task.

The data sent will depend on the detection model. For the person detection task you always receive the bounding boxes for the detected persons, the confidence value, the class id and the list of class names, in this case only [“person”].

If you would run the Gesture detection model that detects Rock, Paper or Scissors gestures the data sent by the Watcher look like this, for instance:

{"boxes":[[176,208,144,218,83,0]],"classes_name":["Paper","Rock","Scissors"]}

Code Example: Extracting detection data

In the previous code example we interpreted and printed the detection data transmitted by the Watcher as a string. However, in many cases you want to extract specific information as numerical data, e.g. the width and the height of the bounding box.

The following code examples shows you how to do this. It extracts the data from the JSON document and prints them as numerical values:

#include <ArduinoJson.h>

DynamicJsonDocument doc(1024 * 100); // 100K

void setup() {
  Serial.begin(115200);
  Serial1.begin(115200, SERIAL_8N1, D7, D6);  // RX, TX
  while (!Serial)
    ;
  delay(100);
  Serial.println("Ready.");
}

void loop() {
  if (Serial1.available()) {
    deserializeJson(doc, Serial1);

    if (doc.containsKey("inference")) {
      JsonArray b = doc["inference"]["boxes"][0].as<JsonArray>();
      Serial.printf("x:%d y:%d w:%d h:%d | score:%d cls_id:%d\n",
                    b[0].as<int>(), b[1].as<int>(), b[2].as<int>(), b[3].as<int>(), 
                    b[4].as<int>(), b[5].as<int>());
    }
  }
}

The code is identical to the previous example apart from the printing of the detection data. Instead of printing as a string (doc["inference"].as()) it extracts the first bounding box from the "boxes" array within the inference data.

This bounding box is a JSON array containing six integers representing the detected object’s coordinates and metadata: x, y, width, height, score (confidence), and cls_id (class ID). These values are printed in a formatted string. Here is an output example.

x:131 y:290 w:240 h:208 | score:71 cls_id:0

Once you have the detection results as numerical data you can perform calculations. For instance, you could approximate the distance of a person to the Watcher by computing

distance = c * b[2].as<int>() * b[3].as<int>();

where b[2] and b[3] contain the width and height of the bounding box, and c is constant to convert the measurement into some distance unit.

Code Example: Desk Inactivity Monitor

In the final code example we will build a Desk Inactivity Monitor. The SenseCAP Watcher will be placed on a desk and will continuously run the person detection task. The detection data is periodically sent to the connected ESP32-C5, which runs a timer. If the person is at their desk for more than 60 minutes without an interruption the ESP32-C5 sounds a buzzer to encourage the person to take a break.

Since the ESP32-C5 has no built-in buzzer, we need to connect an external one. This is easy. Just connect the negative pole of the buzzer to GND and the positive pole to the D0 pin as shown below.

Connecting SenseCAP Watcher to an XIAO ESP32-C5 with a buzzer
Connecting SenseCAP Watcher to an XIAO ESP32-C5 with a buzzer

The other connections remain as before. The table below lists all the connections you need to make.

WatcherESP32-C5Buzzer
RXD6/TX
TXD7/RX
5V5V
GNDGNDGND
D0+

If you need more help with buzzer, have a look at our Active and Passive Piezo Buzzers with Arduino tutorial.

Ensure that you connect the buzzer in the correct polarity and that it is an active buzzer – otherwise the following code for the Desk Inactivity Monitor won’t work.

#include <ArduinoJson.h>

DynamicJsonDocument doc(1024 * 100);  // 100K

const int BUZZER_PIN = D0; // GPIO 1
const unsigned long MAX_SIT_TIME_MS = 1000 * 60 * 60;

unsigned long tPerson = 0;
unsigned long tSitting = 0;
bool personDetected = false;

void setup() {
  Serial.begin(115200);
  Serial1.begin(115200, SERIAL_8N1, D7, D6);  // RX, TX
  pinMode(BUZZER_PIN, OUTPUT);
  digitalWrite(BUZZER_PIN, LOW);

  while (!Serial)
    ;
  delay(100);
  tPerson = millis();
  tSitting = millis();
  Serial.println("running...");
}

void loop() {
  unsigned long tCurrent = millis();

  if (Serial1.available()) {
    deserializeJson(doc, Serial1);
    if (doc.containsKey("inference")) {
      Serial.printf("person sitting %d sec\n", (tCurrent - tSitting)/1000);
      tPerson = tCurrent;
      if (!personDetected)
        tSitting = tCurrent;
      personDetected = true;
    }
  }

  if ((tCurrent - tPerson) > 1000 * 10) {
    personDetected = false;
    tSitting = tCurrent;
  }

  if (personDetected && ((tCurrent - tSitting) >= MAX_SIT_TIME_MS)) {
    tSitting = tCurrent;
    digitalWrite(BUZZER_PIN, HIGH);
    Serial.println("Take a break now!");
    delay(1000);
    digitalWrite(BUZZER_PIN, LOW);
  }
}

Imports

As before, we start by including the ArduinoJson library, which is used to parse JSON data received from the Watcher.

#include <ArduinoJson.h>

JSON Document

Then a DynamicJsonDocument named doc is created with a capacity of 100 kilobytes. This document will hold the parsed JSON data received from the Watcher.

DynamicJsonDocument doc(1024 * 100);  // 100K

Constants

The constant BUZZER_PIN defines the GPIO pin connected to the buzzer, here set to D0 (which corresponds to GPIO 1 on the ESP32-C5). Another constant, MAX_SIT_TIME_MS, defines the maximum allowed sitting time in milliseconds, set to 60 minutes (1000 ms × 60 s × 60 min).

const int BUZZER_PIN = D0; // GPIO 1
const unsigned long MAX_SIT_TIME_MS = 1000 * 60 * 60;

Variables

We use several variables to keep track of timing and detection state. tPerson stores the last time a person was detected, tSitting records when the sitting period started, and personDetected is a boolean flag indicating whether a person is currently detected.

unsigned long tPerson = 0;
unsigned long tSitting = 0;
bool personDetected = false;

Setup Function

The setup() function initializes serial communication for debugging (Serial) and communication with the AI Vision camera (Serial1) at 115200 baud. The buzzer pin is configured as an output and initially turned off. The code waits for the serial port to be ready, then initializes the timing variables tPerson and tSitting with the current time in milliseconds. Finally, it prints “running…” to the serial monitor to indicate the program has started.

void setup() {
  Serial.begin(115200);
  Serial1.begin(115200, SERIAL_8N1, D7, D6);  // RX, TX
  pinMode(BUZZER_PIN, OUTPUT);
  digitalWrite(BUZZER_PIN, LOW);

  while (!Serial)
    ;
  delay(100);
  tPerson = millis();
  tSitting = millis();
  Serial.println("running...");
}

Loop Function

The loop() function runs repeatedly and performs the core monitoring logic. It first reads the current time in milliseconds into tCurrent.

If data is available on Serial1 (from the AI Vision camera), it attempts to parse the incoming JSON into doc. If the JSON contains the key "inference", it means a person has been detected. The code then prints how many seconds the person has been there, updates tPerson to the current time, and if this is the first detection after a period of no detection, it resets tSitting to start timing the sitting duration. The personDetected flag is set to true.

If more than 10 seconds have passed since the last person detection (tCurrent - tPerson > 10000), the code assumes the person has left, sets personDetected to false, and resets tSitting to the current time.

Finally, if a person is detected and the sitting time has reached or exceeded the maximum allowed time (MAX_SIT_TIME_MS), the buzzer is activated for 1 second (1000 ms) to remind the user to take a break. The sitting timer tSitting is reset after the buzzer sounds to start a new monitoring period.

void loop() {
  unsigned long tCurrent = millis();

  if (Serial1.available()) {
    deserializeJson(doc, Serial1);
    if (doc.containsKey("inference")) {
      Serial.printf("person sitting %d sec\n", (tCurrent - tSitting)/1000);
      tPerson = tCurrent;
      if (!personDetected)
        tSitting = tCurrent;
      personDetected = true;
    }
  }

  if ((tCurrent - tPerson) > 1000 * 10) {
    personDetected = false;
    tSitting = tCurrent;
  }

  if (personDetected && ((tCurrent - tSitting) >= MAX_SIT_TIME_MS)) {
    tSitting = tCurrent;
    digitalWrite(BUZZER_PIN, HIGH);
    Serial.println("Take a break now!");
    delay(1000);
    digitalWrite(BUZZER_PIN, LOW);
  }
}

You can test the code by reducing the MAX_SIT_TIME_MS to one minute (1000 * 60). If you sit on your desk for more than 60 seconds the buzzer should sound. If you leave your desk before, the timer will reset.

Note that the person detection model just detects the presence of a person but not whether the person is sitting or not. However, you could use a skeletal model to detect specific poses and count only the time a person is actually sitting, for instance.

Conclusions

The SenseCAP Watcher W1-B combines an ESP32-S3 platform with integrated vision, audio, and AI capabilities. It enables you to build intelligent sensors at the edge. The task-based architecture and flexible connectivity make it straightforward to integrate with established tools such as Node-RED and Home Assistant.

For makers working with Arduino or ESP32 ecosystems, the Watcher can act as a high-level AI co-processor that offloads the compute-intensive AI sensing tasks. The application specific and potentially complex evaluation logic can be implemented on a separate microcontroller that communicates with the Watcher via the serial interface. Note, however, that the SenseCAP Watcher operates with 3.3 Volt logic and a level shifter is required to connect it to 5V Arduino UNO, for instance.

As an AI vision sensor the SenseCAP Watcher is similar to the HUSKYLENS or the HUSKYLENS 2 devices. However, the Watcher is a bit more like a self-contained AI agent that can interpret scenes and act autonomously. Whereas HUSKLENS and HUSKLENS 2 are vision sensors designed to offload the interpretation and reaction to detected objects to a microcontroller via UART/I2C interfaces.

If you have any questions feel free to leave them in the comment section.

Happy Tinkering 😉

Links

Here a list of links that I found useful when writing this tutorial: