Skip to Content

Train an Object Detection Model with Edge Impulse for ESP32-CAM

Train an Object Detection Model with Edge Impulse for ESP32-CAM

In this tutorial you will learn how to train an Object Detection Model with Edge Impulse for the ESP32-CAM. The data collection, data labelling and training of the model will happen on the Edge Impulse platform, but the Detection Model will directly run on the ESP32-CAM.

Since the ESP32-CAM has limited processing power, don’t expect high accuracies or speeds. Nevertheless, there are many TinyML applications, where the ESP32-CAM is sufficient. For instance, you could build a face recognition smart lock that grants access only to you, a pet feeder that detects and feeds your pet when it appears, or a gesture recognition system to control lights or appliances with hand movements.

However, in this introduction to object detection with Edge Impulse we will keep it simple, and train a system that detects and distinguishes just two objects.

Required Parts

You will need an ESP32-CAM to run the object detection. You can get a ESP32-CAM with a USB-TTL Shield for programming or an FTDI USB-TTL Adapter. The FTDI USB-TTL Adapter is a bit more cumbersome to use but leaves the GPIO pins readily accessible. For this tutorial, I would recommend the USB-TTL Shield but make sure you get the correct one (see the Programming the ESP32-CAM tutorial for details).

ESP32-CAM with USB-TTL Shield

FTDI USB-TTL Adapter

Makerguides is a participant in affiliate advertising programs designed to provide a means for sites to earn advertising fees by linking to Amazon, AliExpress, Elecrow, and other sites. As an Affiliate we may earn from qualifying purchases.

What is Edge Impulse?

Edge Impulse is a web platform designed to enable the creation of efficient machine learning models for embedded and edge devices. It makes it easy to build and deploy TinyML (Tiny Machine Learning) applications.

Edge Impulse Workflow
Edge Impulse Workflow (source)

Data Collection and Labeling

You can gather data from a wide variety of sensors including cameras, microphones, accelerometers, gyroscopes, environmental sensors, and more. There are several ways to collect data:

  1. Live data collection via USB/serial: Supported development boards (like the ESP32, Arduino Nano 33 BLE Sense, or ST boards) can be connected directly to the Edge Impulse Studio to stream sensor data in real time.
  2. Remote data collection using the Edge Impulse CLI: For headless or remote devices, users can use the CLI to send data over Wi-Fi or serial to the Edge Impulse platform.
  3. Mobile phone app: The Edge Impulse mobile app (available for Android and iOS) can be used to capture images, audio, and motion data directly from your phone, and push them to your project dashboard.
  4. Upload existing datasets: Users can import data in CSV, JSON, or custom formats and label them within the web interface.
  5. Continuous data collection: Some devices can be configured for continuous sampling and segmented afterward into manageable data windows.

Once data is collected, Edge Impulse provides tools for labeling. The labeling interface supports tagging individual samples or batch labeling large datasets. For visual and audio data, you can zoom in, crop, and annotate events or features of interest. The platform also supports automatic labeling and data augmentation to improve dataset quality.

Signal Processing and Model Training

After labeling, data is passed through DSP (Digital Signal Processing) blocks, which transform raw sensor data into features suitable for machine learning. These blocks are optimized for low-resource environments and are specific to the type of input (e.g., MFCCs for audio, image resizing and color correction for vision, or FFT for vibration data).

Next comes model training. Edge Impulse offers:

  • Classification, for detecting presence/absence or identifying classes (e.g., object types, spoken words).
  • Regression, for predicting continuous values (e.g., estimating angles, temperature, or object position).
  • Object detection, for detecting and locating multiple objects in images.
  • Anomaly detection, for identifying outliers or unusual patterns without requiring labeled data.

You can either use prebuilt model architectures from the Edge Impulse Model Zoo, which includes optimized versions of MobileNet, ResNet, and SqueezeNet among others, or define custom models using Keras and TensorFlow Lite. You can also import ONNX or TensorFlow models.

Model Optimization and Deployment

Once a model is trained, Edge Impulse helps optimize it for edge deployment using techniques like quantization (reducing model size and improving inference speed) and operator fusion.

The deployment is very flexible, with options including:

  • C++ SDK: A fully self-contained SDK that can be integrated into any C++ project. This is ideal for microcontrollers and bare-metal firmware.
  • Arduino Library: Automatically generate an Arduino-compatible library to run your model on boards like the ESP32, Arduino Portenta, and Nano 33 BLE.
  • Prebuilt firmware: For supported devices, Edge Impulse provides firmware binaries with the trained model baked in, ready to flash and run.
  • WebAssembly: Run models in the browser or on any system supporting WebAssembly.
  • Linux-based deployment: For devices like the Raspberry Pi, BeagleBone, or Nvidia Jetson, you can deploy using Python SDKs or Docker containers.

Model Zoo and Transfer Learning

Edge Impulse also provides you with a small Model Zoo containing lightweight, pre-optimized models tailored for edge use cases. These models include:

  • MobileNet variants for image classification
  • FOMO (Fast Object Detection for Microcontrollers), a tiny object detection model suitable for real-time applications
  • Syntiant-compatible audio classifiers
  • Custom keyword spotting models
  • Regression and anomaly detection examples

Many of these models can be fine-tuned via transfer learning, allowing you to achieve high accuracy even with relatively small datasets by building on existing pretrained weights.

Edge Impulse Sign-Up

Before you can use the Edge Impulse you need to sign-up. Good news is, the Developer Plan is for free. You will be limited to three private projects but that is sufficient for trying things out. Go to https://studio.edgeimpulse.com/signup and enter your details in the form:

Edge Impulse signing-up form
Edge Impulse signing-up form

After signing-up you can then create a project, which we will discuss in the next section. If you have any issues the Edge Impulse Documentation may help as well.

Edge Impulse Create Project

To create a project go to https://studio.edgeimpulse.com/studio/profile/projects, click on “+ Create new project” button on the right. This will open a dialog where you can enter a project name, e.g. “esp32-cam-object-detection” and set some project properties, as shown below:

Create Edge Impulse Project

For the project type, select “Personal” and make it a Private project. Then simply press the green “Create new project” button at the bottom.

Edge Impulse Dashboard

After creating a project you should be redirected to the Dashboard. There are few sections, I want to highlight for you:

Edge Impulse Dashboard
Edge Impulse Dashboard

In the center appears the name of your project, e.g. esp32-cam-object-detection, which you can edit. In the right upper corner you can select the target platform.

The AI Thinker ESP32-CAM does not appear as a supported board but you there is the Espressif ESP-EYE, which you should pick for this tutorial. With a small change the generated code will work for the ESP32-CAM as well.

In the sidebar on the left are the function for Data acquisition and Impulse design. An “Impulse” is a short data processing pipeline, which contains the data preprocessing (e.g. scaling) and the Neural Network model.

Edge Impulse Data Acquisition

Edge Impulse supports various methods to collect data. We are going to use the easiest one, and that is to collect images with your smartphone.

Click on the “Data acquisition” item in the sidebar, which will open a new panel. On the right side, under “Collect data” click the link “Connect a device”:

Edge Impulse Connect a Device
Edge Impulse Connect a Device

This will open a dialog with three options. Use the QR code on the left with your mobile phone to open an App that allows you to take pictures as data samples with your phone:

Connect Device via QR code
Connect Device via QR code

Here is a picture of the App how it will appear on your phone after you have given permissions to run the app and to take pictures with it:

Edge Impulse Data Collection App
Edge Impulse Data Collection App

Click the blue “Capture” button to take pictures, which will be sent to the Edge Impulse platform. There we will be label them in a second step.

Makkuro Kurosuke and Car

For this tutorial I will use two objects to be detected, a Makkuro Kurosuke and a Car:

Makkuro Kurosuke and Car
Makkuro Kurosuke and Car

In case you don’t know, Makkuro Kurosuke (or Susuwatari) is a fictional character from the Studio Ghibli film My Neighbor Totoro, and can be loosely translated as “Little Soot-Sprite.” For the labelling, I will just call it “Kurosuke”.

These two object are interesting to try object detection on. The Kurosuke is largely featureless, just a black ball that can be easily confused with other black objects. The car on the other hand is complex in shape, and the model needs to learn a good representation to detect it reliably.

If you collect the data, make sure that you collect under the same conditions the model detection is performed in. Otherwise the detector is likely to make many errors when detecting objects.

For instance, cover different angles, distances, backgrounds and different levels of illumination to make the detection robust. Below some examples of data, I collected (and labelled):

Example Captures
Example Captures

Note, however, the more the environmental conditions change the harder is the task for the object detector and the more data you need to collect. If you want to make the task simple use the same background, angle and light – in other words: control the environment as much as possible.

Data Labelling

Once you have collected enough pictures for both objects – aim at a minimum of 30 pictures per object – you can start labelling the data. The pictures (samples) will appear on the left side and if you click on one, you will get an enlarged view of the picture in the right lower corner:

Selecting pictures for labelling
Selecting pictures for labelling

There you can use your mouse to draw a bounding box around the object to detect and give it a label, in my case “Car” or “Kurosuke”:

Labelling the Car in a picture
Labelling the Car in a picture

You will need to do this for all your images and be consistent with the labelling of the two (or more) object you want to detect.

The good news is, the Edge Impulse platform proposes a bounding box after the first labelling, which makes the labelling process much faster. You can change the size and location of the proposed bounding boxes by clicking on the drag or resize icons at the corners of the box.

Edge Impulse Create Impulse

Once the data collection and labelling is complete, click on “Create Impulse” under “Impulse Design” in the sidebar of the Dashboard:

Create Impulse
Create Impulse

This will open the panel with some processing blocks as shown below.:

Object Detection Impulse
Object Detection Impulse

Initially, the blocks in the panel might be different but you can delete them by clicking on the small trash bin icon in the lower right corner of each block. Then just add new blocks by click on “Add … block” and select and configure them:

Specifically, you need an “Image data” (input) block with an Image size of 96×96 pixels and a Resize mode of “Fit longer axis”.

Next comes an “Image” (processing) block and then an Object Detection (learning) block. The last block are the Output features. You can see that it already shows the two objects (Car, Kurosuke) we want to detect.

Once done, you have created an “Impulse”, which is essentially a short data processing pipeline consisting of a feature processing block and a neural network block (model).

Edge Impulse Features

Next you can adjust and analyze the features that are extracted by the image processing block. Click on “Image” under “Impulse Design” in the sidebar of the Dashboard:

This will open a new panel, where you should select “Grayscale” for the Color depth parameter:

This means we will convert the image to grayscale, which are faster to process and consume less memory. Consequently, the object detection will be faster but the detection accuracy is likely to suffer a bit, since we loose the color information.

Feature Explorer

Next press the blue “Save parameters” button and then you may have to press “Generate Features” to run the Feature explorer:

Feature Explorer for grayscale images
Feature Explorer for grayscale images

The feature explorer converts each of our grayscale images into vectors with 96×96 = 9216 dimensions. It then projects them down to two dimensions and plots them onto a plan. So, each of the dots in the plot above represents an image and the color indicates which object the image contains.

What you want to see here, is dots of the same color to be arranged in tight clusters and that the two clusters of the two different colors are far apart. That indicates that the features are suitable to distinguish the two classes of objects. Below is a made-up example of a very good clustering that would indicate that we have near perfect features:

Clustering for near-perfect features
Clustering for near-perfect features

In my case, you can see that the dots are somewhat clustered but that the separation between clusters is not great but ok. That means it is not a trivial detection problem and the accuracy of the object detection probably will not be perfect.

However, the feature explorer just looks at the raw pixels and the detection model has not be trained and evaluated yet. But the feature explorer allows you to try out different features. For instance, below shows the feature plot for RGB (instead of grayscale) images:

Feature Explorer for RGB images
Feature Explorer for RGB images

As you can see, the clustering isn’t much better if at all, and we therefore stick with the grayscale images, since our inference time will be faster. In the next sections, I show you how to train and evaluate the model. Then we will have a better understanding of the detection performance.

Edge Impulse Train Model

To train the detection model click on “Object detection” in the sidebar:

this will open a panel on the right with settings for the training process and a blue button at the end “Save & train”:

Settings for model training
Settings for model training

You can keep the settings as they are and it will work fine. I myself, increased the Learning rate from 0.001 to 0.01, since the data set is small and the detection problem not very hard.

A higher training rate will make the convergence of the training process faster (error/loss drops faster) and you may achieve a higher accuracy for the same number of training cycles. However, if the training rate is too high the training will not converge and the model will not learn.

Training Output

You can monitor the training process in the console window named Training output. There you should see a continuously decreasing numbers for the Train LOSS:

Training output
Training output

If the Train LOSS goes up or fluctuates, your learning rate is too high. The Validation LOSS should also continuously decrease. If you see it going up, you trained for too many epochs and should reduce the number of training cycles.

Once the training is complete, the platform shows the confusion matrix and other evaluation metrices based on the validation set:

Model evaluation
Model evaluation

In my case, I get a perfect confusion matrix and a perfect F1 SCORE of 100%. In the confusion matrix you can see that the model can perfectly (100%) distinguish between BACKGROUND, CAR and KUROSUKE.

However, this is a bit misleading and not the true performance of the model on new data. The model evaluation is performed on the validation set and if the training and validation data are very small (as it is in my case), the evaluation tends to overly optimistic. For a more more accurate estimate of the model performance, you should collect more training data.

You can also go to “Model testing” and evaluate the performance of the model on the test data:

as you can see, the accuracy on the test set is only 75%

Model evaluation on test data
Model evaluation on test data

but again my test set is very small (12 samples) and the estimated accuracy will not be very reliable.

Edge Impulse Deploy Model

Now, we are ready to deploy the model to the ESP32-CAM. Click on “Deployment” under “Impulse Design” in the sidebar of the Dashboard:

This will open a panel with deployment options:

Deployment options
Deployment options

You can keep the default options, which should be “Arduino library”, “TensorFlow Lite” and “Quantized (int8)” as shown above. If you then click on the blue “Build” button at the bottom, the platform will build an Arduino .ZIP library, which you can download:

The name of this library is derived from the project name and number of deployments. In my case is ei-esp32-cam-object-detection-arduino-1.0.13.zip

Install Detection Model on ESP32-CAM

Finally, we can install our trained detection model on the ESP32-CAM. Just install the downloaded library (ei-esp32-...zip) as usual in the Arduino IDE via Sketch -> Include Library -> Add .ZIP library.

If successful, you can open a sketch named “esp32_camera” that is located under File -> Examples -> esp32-cam-object-detection_inferencing -> esp32:

The name of the example will be the name of your project (in my case esp32-cam-object-detection) with the suffix “_inferencing”.

Define Camera Model

Now, before you can run the model there are two important changes to make! First if you look at the code for the esp32_camera sketch you will find constants defined for the ESP-EYE and the AI-THINKER boards. You must comment out or delete the constant CAMERA_MODEL_ESP_EYE and comment in the constant CAMERA_MODEL_AI_THINKER as shown below:

If you now run the sketch it most likely will still not work! It requires the ESP32 Core Version 2.0.4 to be installed! More about that in the next section.

Install ESP32 Core 2.04

As of August 2025, the ESP32 Core is version 3.3.0. But if you try to run the esp32_camera sketch with this version, you will get an “cam_hal: DMA overflow” error!

To install ESP32 Core 2.0.4 open the BOARDS MANAGER and select 2.0.4 for the esp32 core. Once installed, it should look like this:

ESP32 Core 2.0.4 installed

Now, you should be able to run the detection model without an error. The Arduino IDE will tell you that there are updates for your boards available:

but you don’t want to install those, since that would replace the ESP32 Core Version 2.0.4 with a newer version.

Compile and Run Detection Model on ESP32-CAM

Note that the first compilation will take considerable time. 15 minutes or more, depending on your system! Subsequent compilations will be quicker.

Once compiled and flashed, you can point your ESP32-CAM on some objects and the detection results will be printed out on the Serial Monitor:

ESP32-CAM with objects to detect
ESP32-CAM with objects to detect

You will see the label (class) of the detected object, e.g. “car”, the confidence of the detection, e.g. 0.726 and the coordinates of the bounding box:

Detection Results printed in Serial Monitor
Detection Results printed in Serial Monitor

The system also tells you how long a detection takes. In my case, it is 714ms. Note that the detection (inference) time is independent from the number of training samples and not much affected by the number of objects to detect. So, feel free to improve the performance of the model by collecting more training data.

Conclusions and Comments

In this tutorial you learned how to train an Object Detection Model with Edge Impulse and deploy the model on an ESP32-CAM.

Note that Edge Impulse offers a lot more than was covered in this tutorial and I recommend you to read the Edge Impulse Documentation. For instance, there is live testing of the model in a browser or on your mobile phone – pretty cool. Beyond the ESP32-CAM there are many other deployment targets and options supported. And while we collected the training data using a mobile phone, you can also collect the data from the device itself.

The latter is the better option but a bit more complex (in case of the ESP32-CMA). Why is it better? Because the training data are then collected from the same camera the detection model is using. You will notice, if you try out the live classification via the phone the detection accuracy is better than what you get on the ESP32-CAM.

Drawbacks of Edge Impulse with ESP32-CAM

While Edge Impulse makes it really easy to train and deploy a vision model there are a few draw backs. The ESP32-CAM is not directly supported but it requires only a minor code change. The bigger issue is that the code only runs with the outdated ESP32 2.0.4 Core (Aug 2025), otherwise you get the “cam_hal: DMA overflow” error, when you try to run it.

Next, while the installation via a .ZIP library is convenient, it also means you are flooding your libraries folder with libraries if you are building multiple projects. It would be better to copy the code directly into the sketch to keep it local. I tried that but it didn’t work and probably requires some path manipulations. However, you can deploy the C++ code as well but it won’t run out of the box in the Arduino IDE.

Object detection on PC

The compute power of the ESP32-CAM is pretty limited and the detection time is therefore fairly slow and the model is not very accurate. If you need faster and more accurate detection you can stream the video from the ESP32-CAM to your PC and run the detection model there. See the Object Detection with ESP32-CAM and YOLO tutorial for details on that.

SenseCraft and XIAO ESP32-S3-Sense

If you just want to run a face or person detection model on a small microcontroller, have a look at the Face Detection with XIAO ESP32-S3-Sense and SenseCraft AI and the Edge AI Room Occupancy Sensor with ESP32 and Person Detection tutorials. There we use the SenseCraft platform and an XIAO ESP32-S3-Sense microcontroller. Deployment is easier but you will need a second microcontroller. Have a look.

Battery powered

Running a detection model on a microcontroller opens the door for a battery powered detection system. However, running the ESP32-CAM continuously still consumes quite a lot of battery. You could use a PIR sensor to only activate the detection system, if motion was detected. For details, see our Motion Activated ESP32-CAM tutorial. And, should you be completely new to the ESP32-CAM, I recommend the Programming the ESP32-CAM tutorial, as well.

If you have any questions feel free to leave them in the comment section.

Happy Tinkering 😉

mian saad karim

Tuesday 28th of October 2025

hey so have done all steps as the article says and installed the 2.0.4 core but still get this error in compilation? (FQBN: esp32:esp32:esp32cam Using board 'esp32cam' from platform in folder: C:\Users\ABID COMPUTERS\AppData\Local\Arduino15\packages\esp32\hardware\esp32\2.0.4 Using core 'esp32' from platform in folder: C:\Users\ABID COMPUTERS\AppData\Local\Arduino15\packages\esp32\hardware\esp32\2.0.4 ... cmd /c if exist "C:\\Users\\ABID COMPUTERS\\AppData\\Local\\Temp\\.arduinoIDE-unsaved2025928-12332-1atci3k.01je\\esp32_camera\\partitions.csv" COPY /y "C:\\Users\\ABID COMPUTERS\\AppData\\Local\\Temp\\.arduinoIDE-ue-phone-classification_inferencing.h:49, from C:\Users\ABID COMPUTERS\AppData\Local\Temp\.arduinoIDE-unsaved2025928-12332-1atci3k.01je\esp32_camera\esp32_camera.ino:27: c:\users\abid computers\appdata\local\arduino15\packages\esp32\tools\xtensa-esp32-elf-gcc\gcc8_4_0-esp-2021r2-patch3\xtensa-esp32-elf\include\c++\8.4.0\system_error:39:10: fatal error: bits/error_constants.h: No such file or directory #include ^~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. Alternatives for bits/error_constants.h: [] ResolveLibrary(bits/error_constants.h) -> candidates: [] exit status 1

Compilation error: exit status 1)

Stefan Maetschke

Tuesday 28th of October 2025

That looks like something is wrong with the installation of the ESP32 core. I would try the following:

1) Close Arduino IDE. 2) Remove the ESP32 core completely 3) Open: C:\Users\ABID COMPUTERS\AppData\Local\Arduino15\packages\esp32\ - Delete the whole esp32 folder 4) Reinstall ESP32 core 2.0.4

Codes

Sunday 31st of August 2025

I think I have a newer version of the board and I'm getting the: cam_hal: DMA overflow error as you mentioned! Any solutions or suggestions to fix this issue?

Stefan Maetschke

Sunday 31st of August 2025

Hi, This is error is not related to the board but to the library code. As of Aug 2025 it only works with the old ESP32 2.0.4 Core but not with the current ESP32 core 3.3.x. You will need to install the ESP32 2.0.4 Core. How do that is described in the article.