Skip to Content

Record Audio with XIAO-ESP32-S3-Sense

Record Audio with XIAO-ESP32-S3-Sense

In this tutorial you will learn how to record audio signals with the Seeed Studio XIAO-ESP32-S3-Sense board. If you haven’t used the XIAO-ESP32-S3 Sense before, have a look at the Getting started with XIAO-ESP32-S3-Sense tutorial first.

Required Parts

Obviously, you will need a XIAO ESP32 S3 Sense board by Seeed Studio to try out the code examples. Note that the board can get very hot if you, for instance, when streaming video with a high framerate. I recommend you attach a small heatsink at the back of the board (see the listed part below).

Seeed Studio XIAO ESP32 S3 Sense

USB C Cable

Small Heatsink 9×9 mm

Makerguides is a participant in affiliate advertising programs designed to provide a means for sites to earn advertising fees by linking to Amazon, AliExpress, Elecrow, and other sites. As an Affiliate we may earn from qualifying purchases.

Microphone of XIAO-ESP32-S3-Sense

The XIAO-ESP32-S3-Sense comes with a built-in digital MEMS Microphone of the type MSM261D3526H1CPM that is located on the Sense Hat. See the picture below:

Microphone on Sense Hat

You can communicate with the microphone via two signal lines (PDM_CLK, PDM_DATA) for the I2S protocol that are connected to IO42 and IO41 as shown in the schematic below:

Schematics for Microphone on Sense Hat
Schematics for Microphone on Sense Hat

Note that on the back of PCB for the Sense Hat there are two “Jumper” pads labeled J1 and J2 (red arrows). See the picture below:

Jumper pads on Sense Hat (source)

If you cut the thin wire between these pads (along the white lines) you disable the microphone but the GPIOs D11 and D12 on the Sense Hat become available, otherwise they are used by the microphone. For more details see the Pin Multiplexing Information.

Read Microphone Signal from XIAO-ESP32-S3-Sense

As a first code example, we will display the audio signal detected by the microphone on the Serial Monitor and Serial Plotter:

#include "ESP_I2S.h"

const int8_t I2S_CLK = 42;
const int8_t I2S_DIN = 41;
const uint32_t SAMPLERATE = 16000;

I2SClass I2S;

void setup() {
  Serial.begin(115200);

  I2S.setPinsPdmRx(I2S_CLK, I2S_DIN);
  if (!I2S.begin(I2S_MODE_PDM_RX, SAMPLERATE, I2S_DATA_BIT_WIDTH_16BIT, I2S_SLOT_MODE_MONO)) {
    Serial.println("Can't find microphone!");
    while (1)
      ;
  }
}

void loop() {
  int sample = I2S.read();
  if (sample > 1) {
    Serial.println(sample);
  }
}

Constants and Objects

The code starts by including the ESP_I2S library. We then define constants for the pins the I2S interface of microphone is connected to, and the sample rate:

#include "ESP_I2S.h"

const int8_t I2S_CLK = 42;
const int8_t I2S_DIN = 41;
const uint32_t SAMPLERATE = 16000;

Next we create the I2S objects that allows us to transfer data from and to the microphone via the I2S protocol:

I2SClass I2S;

Setup Function

In the setup function we initialize the Serial interface, set the pins for the I2C interface and initiate the communication for I2S:

void setup() {
  Serial.begin(115200);

  I2S.setPinsPdmRx(I2S_CLK, I2S_DIN);
  if (!I2S.begin(I2S_MODE_PDM_RX, SAMPLERATE, I2S_DATA_BIT_WIDTH_16BIT, I2S_SLOT_MODE_MONO)) {
    Serial.println("Can't find microphone!");
    while (1)
      ;
  }
}

The I2S_MODE_PDM_RX parameter configures the I2S peripheral to operate in PDM (Pulse-Density Modulation) receive mode. PDM microphones output a high-frequency bitstream where the density of ‘1’s corresponds to the signal amplitude. In this mode, the ESP32 hardware takes care of decoding this dense bitstream into usable PCM audio samples.

The SAMPLERATE parameter defines the number of audio samples captured per second, typically measured in Hertz (Hz). In this code, it is set to 16000, meaning the microphone will be sampled 16,000 times every second. This sample rate is a good balance between capturing sufficient detail for human speech and keeping processing and storage requirements low. Lower sample rates reduce memory and CPU load, which is beneficial for battery-powered systems but reduce resolution.

The I2S_DATA_BIT_WIDTH_16BIT parameter sets the bit depth of each PCM audio sample to 16 bits. Bit depth refers to how precisely each audio sample represents the amplitude of the original sound wave. A 16-bit depth offers 65,536 possible amplitude levels per sample, which provides a good dynamic range for capturing subtle variations in volume and tone.

The I2S_SLOT_MODE_MONO parameter configures the I2S interface to operate in mono channel mode, meaning that only a single audio channel is used for data capture. This is appropriate when working with a single PDM microphone, as stereo (dual-channel) operation would be unnecessary and wasteful.

Loop Function

In the loop function we read a single audio sample via I2S.read() from the microphone. If the sample is greater than 1 (to filter out silence/noise floor) we print it to the serial monitor:

void loop() {
  int sample = I2S.read();
  if (sample > 1) {
    Serial.println(sample);
  }
}

If you run the code, open the Serial Plotter and whistle at fixed frequency, you should see a nice sine wave on the Serial Plotter:

Audio Signal in Serial Monitor measured with Microphone
Audio Signal in Serial Monitor measured with Microphone

If you vary the frequency by whistling a bit lower or higher, you will see that the frequency of the displayed sine wave changes accordingly.

Record Audio with XIAO-ESP32-S3-Sense

In this example I will show how to record 5 seconds of audio and write the data as an audio file in WAV format to the SD Card.

We want to start the recording by pressing a button and then record for the next 5 seconds. The following wiring shows you how to connect a button to pin D7 of the board:

Button on pin D7 of the XIAO-ESP32-S3-Sense
Button on pin D7 of the XIAO-ESP32-S3-Sense

Below is the complete code for the project. Have a quick look first and then we will dive into its details:

#include "ESP_I2S.h"
#include "FS.h"
#include "SD.h"

const uint32_t SAMPLERATE = 16000;
const int LEN = 5;  // seconds
const byte btnPin = D7;
const byte ledPin = BUILTIN_LED;

I2SClass i2s;

void recordAudio() {
  static int cnt = 0;
  static char filename[64];
  uint8_t *wav_buffer;
  size_t wav_size;

  Serial.print("RECORDING ... ");
  wav_buffer = i2s.recordWAV(LEN, &wav_size);

  sprintf(filename, "/audio%d.wav", cnt++);
  File file = SD.open(filename, FILE_WRITE);
  file.write(wav_buffer, wav_size);
  file.close();
  free(wav_buffer);
  Serial.printf("COMPLETE => %s\n", filename);
}

void setup() {
  Serial.begin(115200);

  pinMode(btnPin, INPUT_PULLUP);
  pinMode(ledPin, OUTPUT);

  i2s.setPinsPdmRx(42, 41);
  if (!i2s.begin(I2S_MODE_PDM_RX, SAMPLERATE,
                 I2S_DATA_BIT_WIDTH_16BIT, I2S_SLOT_MODE_MONO)) {
    Serial.println("Can't find microphone!");
  }

  if (!SD.begin(21)) {
    Serial.println("Failed to mount SD Card!");
  }
}

void loop() {
  if (!digitalRead(btnPin)) {
    delay(500);
    digitalWrite(ledPin, LOW);
    recordAudio();
    digitalWrite(ledPin, HIGH);
  }
}

Libraries

The code begins by including two important libraries:

#include "ESP_I2S.h"
#include "FS.h"
#include "SD.h"

The ESP_I2S.h library provides an abstraction for handling audio input via the I2S interface, which the PDM microphone uses. The FS.h and SD.h libraries enable file system access to the SD card, allowing audio files to be saved.

Constants

Next, several constants are defined:

const uint32_t SAMPLERATE = 16000;
const int LEN = 5;  // seconds
const byte btnPin = D7;
const byte ledPin = BUILTIN_LED;

The SAMPLERATE is set to 16,000 samples per second, a standard rate for intelligible speech recording. LEN specifies the recording duration in seconds. btnPin is assigned to the button input pin (D7), and ledPin is the onboard LED used for visual feedback during recording.

Objects

Next we create the I2SClass object, which is used to configure and control the I2S audio interface.

I2SClass i2s;

recordAudio Function

In the recordAudio() function we handle the audio recording and file saving:

void recordAudio() {
  static int cnt = 0;
  static char filename[64];
  uint8_t *wav_buffer;
  size_t wav_size;

A static counter cnt is used to create unique filenames for each recording. A buffer pointer wav_buffer and a variable wav_size are declared to hold the recorded audio data and its size.

  Serial.print("RECORDING ... ");
  wav_buffer = i2s.recordWAV(LEN, &wav_size);

The function begins by printing a message to the serial monitor. The recordWAV() function is then called on the i2s object, which records audio for the duration of LEN seconds and returns a pointer to the WAV data and its size.

  sprintf(filename, "/audio%d.wav", cnt++);
  File file = SD.open(filename, FILE_WRITE);
  file.write(wav_buffer, wav_size);
  file.close();
  free(wav_buffer);

A filename is generated using the counter, such as /audio0.wav, /audio1.wav, etc. The SD card is accessed using SD.open, and the WAV data is written to the file. The memory used by wav_buffer is then freed to prevent memory leaks.

  Serial.printf("COMPLETE => %s\n", filename);
}

After saving, a completion message is printed to the serial monitor with the filename of the saved recording.

Setup Function

In the setup() function, we first initialize the Serial communication iat 115200 baud to allow debugging messages.

void setup() {
  Serial.begin(115200);

Next, we configure the button as an input with an internal pull-up resistor, meaning it reads HIGH by default and LOW when pressed. The LED pin is set as an output.

  pinMode(btnPin, INPUT_PULLUP);
  pinMode(ledPin, OUTPUT);

Then we set GPIO 42 and 41 for the PDM microphone input, matching the default wiring for the onboard mic of the XIAO ESP32-S3 Sense.

  i2s.setPinsPdmRx(42, 41);

The I2S interface is initialized in PDM receive mode, with the specified sample rate, 16-bit width, and mono channel. If initialization fails, an error message is shown.

  if (!i2s.begin(I2S_MODE_PDM_RX, SAMPLERATE,
                 I2S_DATA_BIT_WIDTH_16BIT, I2S_SLOT_MODE_MONO)) {
    Serial.println("Can't find microphone!");
  }

Finally, we initialize the SD card using GPIO 21 as the CS (Chip Select) pin. If then mounting fails, we print an error message.

  if (!SD.begin(21)) {
    Serial.println("Failed to mount SD Card!");
  }
}

Loop Function

Finally, the loop() function checks for button presses and starts the recording:

void loop() {
  if (!digitalRead(btnPin)) {
    delay(500);
    digitalWrite(ledPin, LOW);
    recordAudio();
    digitalWrite(ledPin, HIGH);
  }
}

The button is read; if it is LOW (pressed), the program waits 500 milliseconds to debounce the input, then turns on the LED, calls recordAudio(), and turns the LED back off when done. This provides a visual cue that recording is taking place.

Conclusions

In this tutorial you learned how to record audio using your an XIAO-ESP32-S3 Sense.

If you need more information about the XIAO-ESP32-S3 Sense, have a look at the Getting started with XIAO-ESP32-S3-Sense tutorial. And for video streaming see our Stream Video with with XIAO-ESP32-S3-Sense tutorial.

Finally, don’t forget to check out the Getting Started Wiki by Seeed Studio for the XIAO-ESP32-S3-Sense, which also provides rich information on the board and many code examples.

If you have any questions, feel free to leave them in the comment section.

Happy Tinkering ; )