In this post, you will learn how to use the MAX98357A amplifier module with the ESP32 to play audio.
The MAX98357A is a compact digital-to-analog converter (DAC) with a built-in amplifier. It receives digital audio data via the I2S protocol and outputs amplified audio directly to a speaker. This combination simplifies your hardware setup and improves sound quality.
Throughout this tutorial, you will learn how to generate audio signals, convert text to speech, stream internet radio, play MP3 files from an SD card, and use Bluetooth audio.
Required Parts
You will need at least one MAX98357A module. If you want to play stereo you will need two MAX98357A modules. Similarly, you will need two loudspeakers with 4-8 Ω and at least 3 Watts.
To play MP3 files from an SD Card you will furthermore need an SD Card with at least 1GB and an SD Card reader module.
Finally, you need an ESP32, a breadboard and some cables. I used an ESP32 lite but most other ESP32 boards should work as well. Preferably you get an ESP32-S3 with PSRAM, if you plan to store and play music from memory.

2 x MAX98357A Amplifier

2x Loudspeaker 4Ω 5W

Micro SD Card Reader

Micro SD Card 8GB

ESP32 lite

USB Data Cable

Dupont Wire Set

Breadboard
Makerguides is a participant in affiliate advertising programs designed to provide a means for sites to earn advertising fees by linking to Amazon, AliExpress, Elecrow, and other sites. As an Affiliate we may earn from qualifying purchases.
The I2S Audio Protocol
Let’s start with a quick introduction of the I2S protocol used to transfer audio data digitally from an ESP32 to an amplifier module such as the MAX98357.
I2S or Inter-IC Sound, is a serial bus interface standard used for connecting digital audio devices. It was introduced by Philips in the 1980s to simplify the transmission of audio data between integrated circuits. Unlike protocols such as SPI or I2C, I2S is specifically designed for audio applications, ensuring precise timing and synchronization of audio streams.
At its core, I2S transfers Pulse Code Modulation (PCM) audio data in a synchronous manner. The protocol uses three main signals: the serial clock (SCK), word select (WS), and serial data (SD). The serial clock pulses at the bit rate, dictating when bits are sent. The word select signal toggles to indicate whether the current data corresponds to the left or right audio channel. Finally, the serial data line carries the actual audio bits, transmitting the most significant bit (MSB) first.

Typically, audio data is sent in 16-bit or 24-bit words, but the protocol can support other bit depths depending on the hardware.
ESP32’s I2S peripheral supports full-duplex communication, allowing simultaneous audio input and output. It can be configured to operate in master or slave mode. When acting as the master, the ESP32 generates the clock and word select signals. This is the common setup when interfacing with the MAX98357A amplifier, which acts as an I2S slave device. In the next section we have a closer look at the MAX98357A.
Technical Features of the MAX98357A Module
The MAX98357A module is a digital amplifier that accepts I2S input directly. It converts the incoming digital audio stream into an analog signal and amplifies it to drive speakers. It integrates a 3.2W Class-D amplifier with a built-in digital-to-analog converter (DAC). The digital audio interface recognizes up to 35 different PCM and TDM clocking schemes which eliminates the need for I2C programming.

The MAX98357A supports 16-bit, 24-bit, and 32-bit audio samples at sample rates ranging from 8 kHz up to 96 kHz. This flexibility enables high-quality audio playback suitable for voice, music, and other sound applications.
The module operates from a single 3.3V or 5V power supply, making it compatible with most microcontrollers, including the ESP32. It features a built-in low-dropout regulator and thermal shutdown protection. The picture below shows the front and back of a typical MAX98357A Module:

On the input side, the MAX98357A expects three main signals: bit clock (BCLK), word select (LRC), and serial data (SD). The bit clock synchronizes the data bits, while the word select indicates whether the current data corresponds to the left or right audio channel. The serial data line carries the actual audio samples in a continuous stream.

Internally, the module converts the incoming digital audio data into an analog signal using its integrated DAC. This analog signal is then amplified by the Class-D amplifier stage, which drives a 4 or 8 Ω speaker with 3 Watts.
The MAX98357A includes a gain control and can be configured to produce a left channel, right channel, or a mixed output from the stereo input via the SD pin. In the following section we will discuss these pins in more detail.
Pinout of the MAX98357A Module
A typical MAX98357A Module has pins for power supply (Vin, GND), for the I2S interface (LRC, BCLK, DIN), for gain control (GAIN) and for channel selection and shutdown (SD). The picture below shows the pinout:

Power Supply
The VIN pin accepts 3.3V to 5V. The GND pin is the ground reference and must be connected to the ESP32 ground.
The current draw of the MAX98357A for moderately loud audio playback typically is between 200mA and 400mA. The voltage regulator of a typical ESP32 board is around 800mA but the ESP32 lite uses a ME6211 with only 500mA.
This means you can drive a MAX98357A from the 3.3V output pin of an ESP32 but it is risky, since at loud volume and maximum gain the current can go up to 1.5A (at 5V). So, it is better to provide power to MAX98357A from an external power source.
Note that the MAX98357A includes a speaker output current limiter of 2.8A. If the output current exceeds this, the chip will temporarily shut down and restart to protect itself.
I2S Interface
The LRC pin stands for Left-Right Clock or Word Select. It indicates whether the current audio data belongs to the left or right audio channel. The ESP32 generates this signal as part of the I2S protocol.
The BCLK pin is the Bit Clock. It synchronizes the data bits sent over the DIN line. The ESP32 also provides this clock signal.
The DIN pin is the Data Input line. This pin receives the actual audio data from the ESP32.
Gain Control
You can control the gain of the MAX98357A by connecting the gain pin to VCC or GND directly or via a 100K resistor. The following table shows, which connection results in which gain.
| Gain | Connection |
|---|---|
| 15dB | GAIN — 100K — GND |
| 12dB | GAIN — GND |
| 9dB | None |
| 6dB | GAIN — VCC |
| 3dB | GAIN — 100K — VCC |
The gain of 9dB is the default setting and no connection is then required.
Channel Control
The SD (Shutdown) pin of the MAX98357A allows you to shutdown the amplifier or to select the right, left or a mixed output from the stereo input. The picture below shows the schematics of MAX98357A Module with the SD Mode pin and information on how to select a channel:

If the SD pin is not connected (default), the amplifier generates a mixed output from the left and right channel ((L+R)/2). If the SD pin is connected to ground, the amplifier is turned off. Otherwise the voltage on SD determines if the left or right channel is amplified. The following table shows the voltages required at the SD pin to activate the different functions:
| Voltage @ SD | Function |
|---|---|
| < 0.16V | Amplifier off |
| 0.16 … 0.77V | Left and right channel mixed ((L+R)/2) |
| 0.77 … 1.4V | Right channel (370KΩ @ 5V, 210KΩ @ 3V3) |
| > 1.4V | Left channel only (100kΩ) |
To achieve the required voltage at the SD pin to select the right channel you need to connect the SD pin to VCC via a suitable resistor. The schematics shows the following formula:
R = (94 * VDD – 100) KΩ
This means, at 5V you need a 370KΩ between SD and 5V and at 3.3V you need a 210KΩ resistor.
To select the left channel you can use a 100kΩ at 5V or 3.3V. And if you want to have mixed channel output (summed mono) just leave the SD pin unconnected.
Datasheet
The following button links to the datasheet of the MAX98357A, where you can find additional technical details:
Connect MAX98357A to ESP32
In this section you will learn the different ways on how to connect the MAX98357A and speakers to an ESP32.
Summed Mono
We start with the simplest configuration. We are using a single MAX98357A and a single speaker to play a mixed stereo signal (50% left channel, 50% right channel).
Start by connecting Vin of the MAX98357A to 3.3V of the ESP32 and GND to G. Next connect the I2S pins as shown in the following table:
| MAX98357A | ESP32 |
|---|---|
| Vin | 3V3 |
| GND | G |
| LRC | 32 |
| BCLK | 25 |
| DIN | 33 |
Leave the GAIN and the SD pin of the MAX98357A unconnected. This means a left and right channel mix with a gain of 9dB will be generated.
When connecting the speaker watch out for the correct polarity and use a 4Ω o 8Ω speaker with at least 3W. You can use speakers with higher wattage but not lower.
The following picture shows the complete wiring of a MAX98357A with an ESP32 lite for summed mono sound (mix) with power supplied via the 3.3V pin:

As mentioned before, the MAX98357A may draw up to 1.5A current when playing loud sound at maximum gain. The 3.3V pin at the ESP32 cannot provide that much current and you need to use an external power supply. The picture below shows you how to connect an external 5V power to the circuit:

Stereo
If you want to play stereo sound you need two speakers, and two MAX98357A (one per channel) and resistors to select the left or right channel. The picture below shows you the complete wiring:

Both MAX98357A are wired in parallel, using the same power and I2S pins as before. The only difference is that the MAX98357A for the left channel has a 100KΩ resistor between SD and 3.3V and the MAX98357A for the right channel has a 210KΩ resistor.
Note that the ESP32 has to produce a stereo signal and the MAX98357A simply selects the left or right channel of the stereo input depending on the voltage on the SD pin.
As before, instead of driving the MAX98357A via the 3.3V pin of the ESP32 the safer options is to provide external power. The wiring diagram shows you how to use an external 5V power supply:

Note that the SD pin of the MAX98357A is still connected to 3.3V via a 100KΩ or 210KΩ resistor. You can connect the resistors to the external 5V power supply as well but you then should pick a 370KΩ resistor for the right channel instead of the 210KΩ resistor. The 100KΩ resistor for the left channel can stay as it is.
SD Card
If you to play audio files you need to connect an SD Card reader that stores the audio files on an SD Card. The wiring diagram below shows you how to connect an SD Card reader and the MAX98357A to an ESP32:

The SD Card Reader communicates via SPI and the default SPI pins of the ESP32 for SPI are CS=5, MOSI=23, CLK=18 and MISO=19. The table below summarizes the connections you need to make between the SD Card Reader and the ESP32:
| SD Card Reader | ESP32 |
|---|---|
| 3V3 | 3V |
| GND | G |
| CS/SS | 5 |
| MOSI | 23 |
| CLK/SCK | 18 |
| MISO | 19 |
If you are not sure which pins are the default SPI pins of your ESP32 have a look at the Find I2C and SPI default pins tutorial.
If you want to play stereo sound you need to connect two MAX98357A modules. The following diagram shows you how to do that:

If you need more help connecting the SD Card Reader see our SD Card Module with ESP32 tutorial.
Installing Libraries
There are several Arduino libraries you can use to generate audio for an I2S device such as the MAX98357A. In the following, I quickly discuss the three most commonly used ones.
Firstly, there is the ESP8266Audio library by Earle F. Philhower, which supports ESP8266, ESP32, Raspberry Pi Pico RP2040 and Pico 2 RP2350 boards.
Then, there is the ESP32-audioI2S library by schreibfaul1. Note that this library only works on multi-core chips like ESP32, ESP32-S3, ESP32-P4 and your board must have PSRAM. It does not work on the ESP32-S2, ESP32-C3 boards.
Finally, there is the arduino-audio-tools library by Phil Schatzmann, which is the most powerful library with many, many functions. This is the library, we are going to use in this tutorial but note that the other libraries are definitely worth checking out.
Install ESP32 core
As of Jan 2026, I could not get the arduino-audio-tools library working with the current ESP32 core (Version 3.3.6). For the examples in this tutorial you need to downgrade the ESP32 core to Version 2.0.17.
Assuming you already have installed the ESP32 core, downgrading is easy. Open the BOARDS MANAGER, type “esp32” in the search bar and then select the 2.0.17 version for the “esp32 by Espressif” core and click on “UPDATE” as shown below:

If you need more help downgrading or need to install the ESP core, see the Install ESP32 core in Arduino IDE tutorial.
Install arduino-audio-tools Library
To install the arduino-audio-tools library go to the arduino-audio-tools repo, click on the green “<> Code” button and then “Download ZIP” to download the library as a ZIP file as shown below:

Then open a Sketch, go to Sketch -> Include Library -> Add .ZIP Library … to install the downloaded ZIP library (arduino-audio-tools-main.zip):

For some of the code examples we need two more libraries by Phil Schatzmann; namely the arduino-libhelix library and the ESP32-A2DP library. You can install them in the same way. Click on the link to go to the github repo, click on the green “<> Code” button to download the libraries (arduino-libhelix-main.zip, ESP32-A2DP-main.zip) and then install them.
Play Test Sound
Before trying something complex let us first try to play a test sound. This will allow us to verify the wiring of the MAX98357A with the ESP32 and that left and right channel are correctly selected when playing stereo sound. Also this code does not rely on any libraries and works with the current (3.x) and old (2.x) ESP32 core.
#include <driver/i2s.h>
#include <math.h>
#define I2S_PORT I2S_NUM_0
// MAX98357
#define MAX_DIN 33
#define MAX_LRC 32
#define MAX_BCLK 25
// Audio parameters
#define SAMPLE_RATE 44100
#define TONE_FREQ 500
#define AMPLITUDE 1000 // Max 32767
// Channels
#define LEFT false
#define RIGHT true
// Buffer size (frames, not samples)
#define BUFFER_LEN 256
// Stereo buffer: Left, Right
int16_t samples[BUFFER_LEN * 2];
void setup() {
i2s_config_t i2s_config = {
.mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_TX),
.sample_rate = SAMPLE_RATE,
.bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
.channel_format = I2S_CHANNEL_FMT_RIGHT_LEFT,
.communication_format = I2S_COMM_FORMAT_I2S,
.intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
.dma_buf_count = 8,
.dma_buf_len = BUFFER_LEN,
.use_apll = false,
.tx_desc_auto_clear = true,
.fixed_mclk = 0
};
i2s_pin_config_t pin_config = {
.bck_io_num = MAX_BCLK,
.ws_io_num = MAX_LRC,
.data_out_num = MAX_DIN,
.data_in_num = I2S_PIN_NO_CHANGE
};
i2s_driver_install(I2S_PORT, &i2s_config, 0, NULL);
i2s_set_pin(I2S_PORT, &pin_config);
}
void loop() {
static float phase = 0.0;
const float phaseIncrement = 2.0 * PI * TONE_FREQ / SAMPLE_RATE;
for (int i = 0; i < BUFFER_LEN; i++) {
int16_t sound = (int16_t)(AMPLITUDE * sin(phase));
int16_t silence = 0;
samples[2 * i] = (LEFT) ? sound : silence; // Left channel
samples[2 * i + 1] = (RIGHT) ? sound : silence; // Right channel
phase += phaseIncrement;
if (phase >= 2.0 * PI) phase -= 2.0 * PI;
}
size_t bytes_written;
i2s_write(I2S_PORT, samples, sizeof(samples), &bytes_written, portMAX_DELAY);
}
The code configures the ESP32 as an I2S master transmitter, generates a sine wave audio signal in real-time, and sends it to the amplifier to produce sound on either the left, right or both channels.
Imports
The code includes the driver/i2s.h library, which provides the ESP32 I2S driver functions for audio communication. It also includes the standard math library math.h to use mathematical functions like sin() for generating the audio waveform.
#include <driver/i2s.h>; #include <math.h>;
Constants
Several constants are defined to configure the I2S interface and audio parameters. I2S_PORT selects the I2S peripheral number 0 on the ESP32. The pins MAX_DIN, MAX_LRC, and MAX_BCLK correspond to the data, word select (left-right clock), and bit clock lines connected to the MAX98357 amplifier.
Audio parameters include the sample rate of 44,100 Hz, a tone frequency of 500 Hz, and an amplitude of 1000 (out of a 16-bit signed integer range). Most importantly, the code also defines boolean flags LEFT and RIGHT to control which audio channels output sound.
Finally, BUFFER_LEN sets the number of audio frames per buffer, and a stereo buffer array samples holds the interleaved left and right channel audio samples.
#define I2S_PORT I2S_NUM_0 // MAX98357 #define MAX_DIN 33 #define MAX_LRC 32 #define MAX_BCLK 25 // Audio parameters #define SAMPLE_RATE 44100 #define TONE_FREQ 500 #define AMPLITUDE 1000 // Max 32767 // Channels #define LEFT false #define RIGHT true // Buffer size (frames, not samples) #define BUFFER_LEN 256 // Stereo buffer: Left, Right int16_t samples[BUFFER_LEN * 2];
Setup function
The setup() function initializes the I2S driver and configures the pins for communication with the MAX98357 amplifier.
An i2s_config_t structure is created to specify the I2S mode as master transmitter, 16-bit samples, stereo channel format (right and left), and a sample rate of 44.1 kHz. It also sets DMA buffer parameters for efficient data transfer.
Next, an i2s_pin_config_t structure assigns the GPIO pins for bit clock (bck_io_num), word select (ws_io_num), and data output (data_out_num). The data input pin is not used and set to I2S_PIN_NO_CHANGE.
Finally, the I2S driver is installed with i2s_driver_install() and the pin configuration applied with i2s_set_pin().
void setup() {
i2s_config_t i2s_config = {
.mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_TX),
.sample_rate = SAMPLE_RATE,
.bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
.channel_format = I2S_CHANNEL_FMT_RIGHT_LEFT,
.communication_format = I2S_COMM_FORMAT_I2S,
.intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
.dma_buf_count = 8,
.dma_buf_len = BUFFER_LEN,
.use_apll = false,
.tx_desc_auto_clear = true,
.fixed_mclk = 0
};
i2s_pin_config_t pin_config = {
.bck_io_num = MAX_BCLK,
.ws_io_num = MAX_LRC,
.data_out_num = MAX_DIN,
.data_in_num = I2S_PIN_NO_CHANGE
};
i2s_driver_install(I2S_PORT, &i2s_config, 0, NULL);
i2s_set_pin(I2S_PORT, &pin_config);
}
Loop function
The loop() function continuously generates audio samples and sends them to the I2S peripheral for playback.
A static variable phase keeps track of the current position within the sine wave cycle. The phaseIncrement is calculated based on the desired tone frequency and sample rate, determining how much the phase advances per sample.
Inside the loop, the code fills the samples buffer with interleaved stereo audio data. For each frame, it calculates the sine of the current phase, scales it by the amplitude, and casts it to a 16-bit integer. Depending on the LEFT and RIGHT flags, the sound is assigned to the left and/or right channels, while the other channel is set to silence (zero).
The phase is incremented and wrapped around to stay within the range of 0 to 2π radians, ensuring a continuous waveform.
After filling the buffer, the i2s_write() function sends the audio data to the I2S driver, which transmits it to the amplifier. The function blocks until all bytes are written, maintaining smooth audio playback.
void loop() {
static float phase = 0.0;
const float phaseIncrement = 2.0 * PI * TONE_FREQ / SAMPLE_RATE;
for (int i = 0; i < BUFFER_LEN; i++) {
int16_t sound = (int16_t)(AMPLITUDE * sin(phase));
int16_t silence = 0;
samples[2 * i] = (LEFT) ? sound : silence; // Left channel
samples[2 * i + 1] = (RIGHT) ? sound : silence; // Right channel
phase += phaseIncrement;
if (phase >= 2.0 * PI) phase -= 2.0 * PI;
}
size_t bytes_written;
i2s_write(I2S_PORT, samples, sizeof(samples), &bytes_written, portMAX_DELAY);
}
You can use this code to verify that stereo sound is played correctly when using two MAX98357 modules and two speakers. If you set the LEFT constant to true and the RIGHT constant to false, only the left speaker should play sound. Similarly, you can check that the right channels plays sound correctly. If not, check the voltage and resistors on the SD pin on of MAX98357, which controls the channel.
For the following examples we will use the arduino-audio-tools library, which will hide all the details of the I2S communication and will simplify the code.
Text to speech
This next example demonstrates how to convert text to speech using the I2S interface with a MAX98357 amplifier. It connects to WiFi, sends text to the OpenAI Text-to-Speech (TTS) API, receives an MP3 audio stream, decodes it, and plays it through the amplifier. The code uses the arduino-audio-tools library to handle audio streaming and decoding.
/*
www.makerguides.com
Libraries:
- ESP32 Core 2.0.17
- [arduino-audio-tools](https://github.com/pschatzmann/arduino-audio-tools)
Version: 1.2.2
- [arduino-libhelix](https://github.com/pschatzmann/arduino-libhelix)
Version: 0.9.2
*/
#include <Arduino.h>
#include <WiFi.h>
#include <WiFiClientSecure.h>
#include "AudioTools.h"
#include "AudioTools/AudioCodecs/CodecMP3Helix.h"
// MAX98357 I2S pins
#define MAX_DIN 33
#define MAX_LRC 32
#define MAX_BCLK 25
// Text to Speech
#define TTS_MODEL "gpt-4o-mini-tts"
#define TTS_VOICE "marin"
#define TTS_VOLUME 0.6
// WiFi credentials
const char* ssid = "ssid";
const char* password = "pwd";
// OpenAI configuration
const char* openaiHost = "api.openai.com";
const int openaiPort = 443;
const char* openaiApiKey = "apikey";
WiFiClientSecure client;
I2SStream i2s;
VolumeStream volume(i2s);
EncodedAudioStream mp3decode(&volume, new MP3DecoderHelix());
StreamCopy copier(mp3decode, client);
void text2speech(const char* text) {
client.setInsecure();
if (!client.connect("api.openai.com", 443)) {
Serial.println("Connection failed");
return;
}
String body = String("{") +
"\"model\":\"" + TTS_MODEL + "\"," +
"\"voice\":\"" + TTS_VOICE + "\"," +
"\"format\":\"mp3\"," +
"\"input\":\"" + text + "\"" +
"}";
client.println("POST /v1/audio/speech HTTP/1.1");
client.println("Host: api.openai.com");
client.println("Authorization: Bearer " + String(openaiApiKey));
client.println("Content-Type: application/json");
client.print("Content-Length: ");
client.println(body.length());
client.println();
client.print(body);
// ---- Skip HTTP headers ----
while (client.connected()) {
String line = client.readStringUntil('\n');
if (line == "\r") break;
}
}
void setup() {
Serial.begin(115200);
AudioLogger::instance().begin(Serial, AudioLogger::Warning);
WiFi.begin(ssid, password);
while (WiFi.status() != WL_CONNECTED) {
delay(500);
}
auto config = i2s.defaultConfig(TX_MODE);
config.pin_bck = MAX_BCLK;
config.pin_ws = MAX_LRC;
config.pin_data = MAX_DIN;
i2s.begin(config);
mp3decode.begin();
volume.begin(config);
volume.setVolume(TTS_VOLUME);
text2speech("Hello, this a test for text to speech.");
}
void loop() {
copier.copy();
}
Imports
The code begins by including several libraries. Arduino.h is the core Arduino library. WiFi.h and WiFiClientSecure.h provide WiFi connectivity and secure HTTPS client functionality. The AudioTools library and its MP3 codec CodecMP3Helix are used for audio streaming and decoding.
#include <Arduino.h> #include <WiFi.h> #include <WiFiClientSecure.h> #include "AudioTools.h" #include "AudioTools/AudioCodecs/CodecMP3Helix.h"
Constants and Pin Definitions
Next, the pins used for the I2S interface to the MAX98357 amplifier are defined. These pins correspond to the data input (MAX_DIN), word select or left-right clock (MAX_LRC), and bit clock (MAX_BCLK).
// MAX98357 I2S pins #define MAX_DIN 33 #define MAX_LRC 32 #define MAX_BCLK 25
Additionally, constants for the TTS model, voice, and volume level are set.
// Text to Speech #define TTS_MODEL "gpt-4o-mini-tts" #define TTS_VOICE "marin" #define TTS_VOLUME 0.6
You can try other TTS models, such as “tts-1” and other voices such as “alloy”, “ash”, “coral”, “echo”, “fable”, “onyx”, “nova”, ‘sage”, “shimmer”, “marin”, “cedar”. For more details have a look at platform.openai.com/docs/guides/text-to-speech.
WiFi credentials and OpenAI API details are also stored as constant strings. You will have to replace “ssid” and “pwd” with the credentials for your WiFi.
// WiFi credentials const char* ssid = "ssid"; const char* password = "pwd";
You also will need to get an “apikey” from OpenAI. Go to https://platform.openai.com and sign up with an email address or an existing Google or Microsoft account.
// OpenAI configuration const char* openaiHost = "api.openai.com"; const int openaiPort = 443; const char* openaiApiKey = "apikey";
After verifying your email and completing the initial setup, log in to the OpenAI dashboard, platform.openai.com/api-keys and find or create your API Key (=SECRET KEY) as shown below:

The API Key is a unique, long string, starting with “sk-proj-” that is needed to authenticate your API requests (see below).
sk-proj-xcA.......................OtDu0U
That is all you need to get an API key but I recommend you set a usage limit for your account as well. For more details see the Vision Chatbot with DFRobot ESP32-S3 AI Camera and OpenAI tutorial.
Audio and Network Objects
Several objects are instantiated to manage audio streaming and network communication. WiFiClientSecure handles the HTTPS connection to the OpenAI API. I2SStream manages the I2S audio output. VolumeStream wraps the I2S stream to control audio volume. EncodedAudioStream decodes the MP3 audio data using the Helix MP3 decoder. Finally, StreamCopy copies the decoded audio stream from the network client to the audio output.
WiFiClientSecure client; I2SStream i2s; VolumeStream volume(i2s); EncodedAudioStream mp3decode(&volume, new MP3DecoderHelix()); StreamCopy copier(mp3decode, client);
Text-to-Speech Function
The text2speech() function sends a text string to the OpenAI TTS API and prepares the audio stream for playback. It first configures the client to accept insecure certificates (useful for development). Then it attempts to connect to the OpenAI server on port 443. If the connection fails, it prints an error message and returns.
The function builds a JSON body specifying the TTS model, voice, output format (MP3), and the input text. It sends an HTTP POST request with the appropriate headers including authorization using the API key. After sending the request body, it reads and skips the HTTP response headers to position the client stream at the start of the MP3 audio data.
void text2speech(const char* text) {
client.setInsecure();
if (!client.connect("api.openai.com", 443)) {
Serial.println("Connection failed");
return;
}
String body = String("{") +
"\"model\":\"" + TTS_MODEL + "\"," +
"\"voice\":\"" + TTS_VOICE + "\"," +
"\"format\":\"mp3\"," +
"\"input\":\"" + text + "\"" +
"}";
client.println("POST /v1/audio/speech HTTP/1.1");
client.println("Host: api.openai.com");
client.println("Authorization: Bearer " + String(openaiApiKey));
client.println("Content-Type: application/json");
client.print("Content-Length: ");
client.println(body.length());
client.println();
client.print(body);
// ---- Skip HTTP headers ----
while (client.connected()) {
String line = client.readStringUntil('\n');
if (line == "\r") break;
}
}
Setup Function
The setup() function initializes serial communication for debugging and sets the audio logger to show warnings. It then connects to the specified WiFi network, waiting until the connection is established.
After WiFi is connected, the I2S interface is configured for transmission mode with the pins defined earlier. The I2S stream, MP3 decoder, and volume control are initialized. The volume is set to the predefined level.
Finally, the text2speech() function is called with a sample text string to start streaming and playing the synthesized speech.
void setup() {
Serial.begin(115200);
AudioLogger::instance().begin(Serial, AudioLogger::Warning);
WiFi.begin(ssid, password);
while (WiFi.status() != WL_CONNECTED) {
delay(500);
}
auto config = i2s.defaultConfig(TX_MODE);
config.pin_bck = MAX_BCLK;
config.pin_ws = MAX_LRC;
config.pin_data = MAX_DIN;
i2s.begin(config);
mp3decode.begin();
volume.begin(config);
volume.setVolume(TTS_VOLUME);
text2speech("Hello, this a test for text to speech.");
}
Loop Function
The loop() function continuously copies data from the MP3 decoder stream to the I2S audio output. This keeps the audio playing as long as there is data available from the network client.
void loop() {
copier.copy();
}
Internet radio
This example demonstrates how to implement a simple Web radio. It connects to a WiFi network, streams internet radio in MP3 format, decodes the audio, adjusts the volume, and outputs the sound through the MAX98357 digital amplifier.
/*
www.makerguides.com
Libraries:
- ESP32 Core 2.0.17
- [arduino-audio-tools](https://github.com/pschatzmann/arduino-audio-tools)
Version: 1.2.2
- [arduino-libhelix](https://github.com/pschatzmann/arduino-libhelix)
Version: 0.9.2
*/
#include <Arduino.h>
#include <WiFi.h>
#include <Wire.h>
#include "AudioTools.h"
#include "AudioTools/AudioCodecs/CodecMP3Helix.h"
#include "AudioTools/Communication/HTTP/ICYStream.h"
// MAX98357
#define MAX_DIN 33 // serial data
#define MAX_LRC 32 // word select
#define MAX_BCLK 25 // serial clock
#define MAX_VOL 0.5 // Volume
const char* ssid = "ssid";
const char* password = "pwd";
const char* url = "https://jazz.stream.laut.fm/jazz";
ICYStream icystream;
I2SStream i2s;
VolumeStream volume(i2s);
EncodedAudioStream mp3decode(&volume, new MP3DecoderHelix());
StreamCopy copier(mp3decode, icystream);
void callbackMetadata(MetaDataType type, const char* str, int len) {
Serial.printf("%s: %s\n", toStr(type), str);
}
void setup() {
Serial.begin(115200);
AudioLogger::instance().begin(Serial, AudioLogger::Warning);
WiFi.begin(ssid, password);
while (WiFi.status() != WL_CONNECTED) {
delay(500);
}
auto config = i2s.defaultConfig(TX_MODE);
config.pin_bck = MAX_BCLK;
config.pin_ws = MAX_LRC;
config.pin_data = MAX_DIN;
i2s.begin(config);
volume.begin(config);
volume.setVolume(MAX_VOL);
mp3decode.begin();
icystream.begin(url);
icystream.setMetadataCallback(callbackMetadata);
}
void loop() {
copier.copy();
}
Imports
The code begins by including libraries necessary for WiFi connectivity, I2C communication, and audio processing. The Arduino.h and WiFi.h libraries provide basic Arduino and WiFi functions. The Wire.h library is included for I2C communication, which is often used for controlling peripherals. The AudioTools library and its related components handle audio streaming, decoding, and playback.
#include <Arduino.h> #include <WiFi.h> #include <Wire.h> #include "AudioTools.h" #include "AudioTools/AudioCodecs/CodecMP3Helix.h" #include "AudioTools/Communication/HTTP/ICYStream.h"
Constants
Next, the pins for the MAX98357 amplifier are defined. These pins correspond to the I2S signals: MAX_DIN for serial data input, MAX_LRC for word select (left-right clock), and MAX_BCLK for the serial clock. The volume is set as a floating-point value between 0 and 1, where MAX_VOL is 0.5, representing 50% volume.
#define MAX_DIN 33 // serial data #define MAX_LRC 32 // word select #define MAX_BCLK 25 // serial clock #define MAX_VOL 0.5 // Volume
WiFi Credentials and Stream URL
The WiFi network credentials are stored in the ssid and password constants. The url constant holds the address of the internet radio stream to be played, in this case a jazz stream.
const char* ssid = "ssid"; const char* password = "pwd"; const char* url = "https://jazz.stream.laut.fm/jazz";
Here are a few more URLs of internet radio streams you can try out:
"https://jazz.stream.laut.fm/jazz" "http://vis.media-ice.musicradio.com/CapitalMP3"; "http://stream.srg-ssr.ch/m/rsj/mp3_128" "http://stream.live.vc.bbcmedia.co.uk/bbc_world_service" "http://icecast.omroep.nl/radio1-bb-mp3" "http://stream-02-eu.relaxingjazz.com/stream/1/"
Audio Objects
Several objects are instantiated to manage the audio pipeline. The ICYStream object handles the HTTP streaming of the internet radio and manages the I2S audio output interface. The VolumeStream object wraps the I2S stream to control the audio volume. And the EncodedAudioStream object decodes the MP3 data using the MP3DecoderHelix codec. Finally, the StreamCopy object copies decoded audio data from the MP3 decoder to the ICY stream.
ICYStream icystream; I2SStream i2s; VolumeStream volume(i2s); EncodedAudioStream mp3decode(&volume, new MP3DecoderHelix()); StreamCopy copier(mp3decode, icystream);
Metadata Callback Function
The callbackMetadata() function is defined to handle metadata received from the internet radio stream, such as song titles or artist information. It prints the metadata type and content to the serial monitor for debugging or informational purposes.
void callbackMetadata(MetaDataType type, const char* str, int len) {
Serial.printf("%s: %s\n", toStr(type), str);
}
Setup Function
In the setup() function, serial communication is initialized at 115200 baud to allow logging and debugging output. The audio logger is configured to output warnings and above to the serial monitor.
The ESP32 then attempts to connect to the specified WiFi network, repeatedly checking the connection status every 500 milliseconds until successful.
After connecting, the I2S configuration is obtained using the default transmit mode. The I2S pins are assigned to the previously defined constants for bit clock, word select, and data input. The I2S interface and volume control are initialized with this configuration, and the volume is set to 50%.
The MP3 decoder is started, and the internet radio stream is initialized with the provided URL. The metadata callback function is registered to handle incoming metadata during playback.
void setup() {
Serial.begin(115200);
AudioLogger::instance().begin(Serial, AudioLogger::Warning);
WiFi.begin(ssid, password);
while (WiFi.status() != WL_CONNECTED) {
delay(500);
}
auto config = i2s.defaultConfig(TX_MODE);
config.pin_bck = MAX_BCLK;
config.pin_ws = MAX_LRC;
config.pin_data = MAX_DIN;
i2s.begin(config);
volume.begin(config);
volume.setVolume(MAX_VOL);
mp3decode.begin();
icystream.begin(url);
icystream.setMetadataCallback(callbackMetadata);
}
Loop Function
The loop() function continuously copies audio data from the internet radio stream through the MP3 decoder and volume controller to the I2S output. This process keeps the audio playback running indefinitely.
void loop() {
copier.copy();
}
If you want to add more functionality such a volume controller, see our Playing Audio with ESP32 and PCM5102A tutorial. And if you want to add a display see the Internet Radio with ESP32 and MAX 98357A tutorial.
Play MP3 from SD card
This example demonstrates how to play MP3 audio files stored on an SD card using an ESP32, the MAX98357 I2S amplifier, and the AudioTools library. The code initializes the audio hardware, sets up the MP3 decoder, and continuously streams audio data to the amplifier.
/*
www.makerguides.com
Libraries:
- ESP32 Core 2.0.17
- [arduino-audio-tools](https://github.com/pschatzmann/arduino-audio-tools)
Version: 1.2.2
- [arduino-libhelix](https://github.com/pschatzmann/arduino-libhelix)
Version: 0.9.2
*/
#include "AudioTools.h"
#include "AudioTools/Disk/AudioSourceSD.h"
#include "AudioTools/AudioCodecs/CodecMP3Helix.h"
// MAX98357
#define MAX_DIN 33
#define MAX_LRC 32
#define MAX_BCLK 25
#define PATH "/"
#define EXT "mp3"
AudioSourceSD source(PATH, EXT);
I2SStream i2s;
MP3DecoderHelix decoder;
AudioPlayer player(source, i2s, decoder);
void printMetaData(MetaDataType type, const char* str, int len){
Serial.printf("%s: %s\n", toStr(type), str);
}
void setup() {
Serial.begin(115200);
AudioToolsLogger.begin(Serial, AudioToolsLogLevel::Warning);
auto cfg = i2s.defaultConfig(TX_MODE);
cfg.pin_bck = MAX_BCLK;
cfg.pin_ws = MAX_LRC;
cfg.pin_data = MAX_DIN;
i2s.begin(cfg);
//source.setFileFilter("*Bob Dylan*");
player.setMetadataCallback(printMetaData);
player.setVolume(0.4);
player.begin();
}
void loop() {
player.copy();
}
Imports
The code begins by including the header files from the AudioTools library. These provide the necessary classes and functions for handling audio sources, decoding MP3 files, and streaming audio data via I2S.
#include "AudioTools.h" #include "AudioTools/Disk/AudioSourceSD.h" #include "AudioTools/AudioCodecs/CodecMP3Helix.h"
Constants
Next, the pins connected to the MAX98357 amplifier are defined. These specify the I2S data input (MAX_DIN), word select or left-right clock (MAX_LRC), and bit clock (MAX_BCLK) pins on the ESP32.
#define MAX_DIN 33 #define MAX_LRC 32 #define MAX_BCLK 25
Additionally, constants for the audio source path and file extension are set. Here, PATH is the root directory on the SD card, and EXT specifies that only MP3 files will be considered.
#define PATH "/" #define EXT "mp3"
Objects
Several objects are instantiated to manage the audio playback pipeline. AudioSourceSD represents the SD card audio source, filtering files by the specified path and extension. I2SStream handles the I2S audio output stream. MP3DecoderHelix is the MP3 decoder based on the Helix codec. Finally, AudioPlayer ties these components together to manage playback.
AudioSourceSD source(PATH, EXT); I2SStream i2s; MP3DecoderHelix decoder; AudioPlayer player(source, i2s, decoder);
Metadata Callback Function
The function printMetaData() is defined to handle metadata information such as artist or track title. It receives the metadata type and string, then prints it to the serial console for debugging or informational purposes.
void printMetaData(MetaDataType type, const char* str, int len){
Serial.printf("%s: %s\n", toStr(type), str);
}
Setup Function
In the setup() function, serial communication is initialized at 115200 baud for logging. The AudioTools logger is also started with a warning level to capture important messages.
The I2S configuration is obtained from the default settings for transmit mode. The pins for bit clock, word select, and data are assigned to the previously defined constants corresponding to the MAX98357 connections. The I2S stream is then initialized with this configuration.
Optionally, a file filter can be set on the audio source to play only files matching a pattern (commented out in this example). The metadata callback is set to the printMetaData() function to receive metadata during playback. The volume is set to 40% to control output loudness. Finally, the audio player is started.
void setup() {
Serial.begin(115200);
AudioToolsLogger.begin(Serial, AudioToolsLogLevel::Warning);
auto cfg = i2s.defaultConfig(TX_MODE);
cfg.pin_bck = MAX_BCLK;
cfg.pin_ws = MAX_LRC;
cfg.pin_data = MAX_DIN;
i2s.begin(cfg);
//source.setFileFilter("*Bob Dylan*");
player.setMetadataCallback(printMetaData);
player.setVolume(0.4);
player.begin();
}
Loop Function
The loop() function continuously calls player.copy(), which handles streaming audio data from the SD card through the decoder and out via I2S to the amplifier. This keeps the audio playback running smoothly without blocking other processes.
void loop() {
player.copy();
}
If you want to add more functionality such a volume controller and buttons to skip tracks, see the Playing Audio with ESP32 and PCM5102A tutorial.
Play Audio from Bluetooth
This example demonstrates how to stream Bluetooth audio. The code sets up the I2S interface with specific pins and initializes a Bluetooth A2DP sink, allowing the ESP32 to receive and play audio from Bluetooth devices.
/*
www.makerguides.com
Libraries:
- ESP32 Core 2.0.17
- [arduino-audio-tools](https://github.com/pschatzmann/arduino-audio-tools)
Version: 1.2.2
- [arduino-libhelix](https://github.com/pschatzmann/arduino-libhelix)
Version: 0.9.2
- [ESP32-A2DP](https://github.com/pschatzmann/ESP32-A2DP)
Version: 1.8.8
*/
#include "AudioTools.h"
#include "BluetoothA2DPSink.h"
#define MAX_DIN 33 // serial data
#define MAX_LRC 32 // word select
#define MAX_BCLK 25 // serial clock
I2SStream i2s;
BluetoothA2DPSink a2dp_sink(i2s);
void setup() {
auto cfg = i2s.defaultConfig();
cfg.pin_bck = MAX_BCLK;
cfg.pin_ws = MAX_LRC;
cfg.pin_data = MAX_DIN;
i2s.begin(cfg);
a2dp_sink.start("MyMusic");
}
void loop() { }
Imports
The code starts by including two important libraries. The AudioTools.h library provides tools for handling audio streams and configuring the I2S interface. The BluetoothA2DPSink.h library enables the ESP32 to act as a Bluetooth A2DP sink, which means it can receive audio streams from Bluetooth sources like smartphones.
#include "AudioTools.h" #include "BluetoothA2DPSink.h"
Constants
Next, three constants are defined to specify the GPIO pins used for the I2S interface. These pins connect the ESP32 to the MAX98357 amplifier. MAX_DIN is the serial data input pin, MAX_LRC is the word select or left-right clock pin, and MAX_BCLK is the serial clock pin.
#define MAX_DIN 33 // serial data #define MAX_LRC 32 // word select #define MAX_BCLK 25 // serial clock
Objects
An I2SStream object named i2s is created to manage the I2S audio stream. Then, a BluetoothA2DPSink object called a2dp_sink is instantiated, passing the i2s object to it. This setup links the Bluetooth audio input directly to the I2S output, enabling seamless audio playback through the amplifier.
I2SStream i2s; BluetoothA2DPSink a2dp_sink(i2s);
Setup function
Inside the setup() function, the I2S interface is configured and started. First, the default I2S configuration is obtained by calling i2s.defaultConfig(). Then, the pin assignments for the bit clock (pin_bck), word select (pin_ws), and data input (pin_data) are set to the previously defined constants. Finally, the I2S interface is initialized with this configuration by calling i2s.begin(cfg).
After setting up I2S, the Bluetooth A2DP sink is started with the device name "MyMusic". This name will appear when other Bluetooth devices scan for available audio sinks.
void setup() {
auto cfg = i2s.defaultConfig();
cfg.pin_bck = MAX_BCLK;
cfg.pin_ws = MAX_LRC;
cfg.pin_data = MAX_DIN;
i2s.begin(cfg);
a2dp_sink.start("MyMusic");
}
To try this, open your mobile phone, look for “connected devices” or “Bluetooth devices”, search for the “MyMusic” device, connect to it, and then play some music. You should be able to hear it playing over your ESP32 and MAX98357.
Loop function
The loop() function is empty because all the audio streaming and playback are handled asynchronously by the Bluetooth A2DP sink and the I2S interface. Once started, the ESP32 continuously listens for Bluetooth audio streams and outputs them via I2S to the amplifier without requiring further code in the main loop.
void loop() { }
Conclusions
In this project, you learned how to play audio using the ESP32 and the MAX98357 amplifier. We explored the technical details of the MAX98357A module and how to wire it to the ESP32 for mono or stereo sound. You also learned how to convert text to speech, stream internet radio, play MP3 files from an SD Card and play audio via Bluetooth.
If you want to add more functionality such a volume controller and buttons to skip tracks, see our Playing Audio with ESP32 and PCM5102A tutorial. Similarly, if you need more background on the SD Card Reader module used here, have a look at the SD Card Module with ESP32 tutorial.
For better and louder sound you can use PCM5102A DAC and add an amplifier. See the following tutorials for more information:
| Tutorial | Output Wattage | Power Supply |
| TDA7379 Class AB Audio Amplifier with ESP32 | 2 × 38 W | 9 .. 15 V |
| High-Power ESP32 Audio with TPA3116D2 and PCM5102 | 2 x 50 W | 4.5 .. 26 V |
| Audio with PAM8403, PCM5102 and ESP32 | 2 x 3 W | 2.5 .. 5.5 V |
| Stereo Amplifier with TPA31110 XH-A232, PCM5102 and ESP32 | 2 x 30 W | 8 .. 26 V |
| Playing Audio with ESP32 and MAX98357 | 1 x 3 W | 3.3 .. 5V |
| Playing Audio with ESP32 and PCM5102A | line level | 3.3 .. 5 V |
| Playing Sound with PAM8403 and ESP32 | 2 x 3 W | 2.5 .. 5.5 V |
| Audio with YDA138-E Amplifier, PCM5102 and ESP32 | 2 x 20 W | 9 .. 13.5 V |
If you have any question feel free to leave them in the comment section.
Happy Tinkering ; )
FAQ
Q: What is the MAX98357A and why should I use it?
The MAX98357A is a digital audio amplifier with a built-in DAC, which means it takes digital audio directly from the ESP32 using I2S and outputs an already amplified signal that can drive a speaker without any extra amplifier, making the circuit simpler while also improving sound quality compared to the ESP32 internal DAC.
Q: How is the MAX98357A connected to the ESP32?
The module uses three I2S signals plus power.
Typical wiring:
ESP32_3V3 or 5V ----> VIN ESP32_GND ---------> GND ESP32_GPIO25 -----> BCLK ESP32_GPIO32 -----> LRC ESP32_GPIO33 -----> DIN
The ESP32 acts as I2S master and sends audio data, while the MAX98357A receives and converts it to sound.
Q: Can I connect a speaker directly to the MAX98357A?
Yes, because the MAX98357A already includes an amplifier stage, so you can connect a speaker directly.
SPK+ ---- Speaker ---- SPK-
Typical speakers are 4Ω to 8Ω, 3W or higher. The amplifier can deliver around 3W of power, which is enough for small speakers.
Q: How do I get stereo sound?
The MAX98357A is mono, so for stereo you need two modules.
ESP32 --> 2x MAX98357A --> 2x Speakers
Both modules share the same I2S signals, but use resistors to select left and right channels.
SD -- 100kΩ -- 3.3V (left channel) SD -- 210kΩ -- 3.3V (right channel)
This way each amplifier plays one channel.
Q: What power supply should I use?
The module works with 3.3V or 5V, but power is important for good sound. At high volume, the module can draw large current, so powering it from the ESP32 3.3V pin is often not enough and may cause distortion or resets.
Q: How can I improve sound quality with power supply filtering?
A clean power supply is critical, because noise directly affects the audio output. You can add capacitors:
VIN -- 100nF -- GND VIN -- 100µF -- GND
The small capacitor removes fast switching noise, while the large capacitor smooths voltage ripple and prevents drops during loud audio peaks.
In addition you can use a filtering capacitor:
5V -- 10Ω --+-- AMP_VIN | 220µF | GND
This reduces noise from USB or switching regulators.
Q: Should I use an external power supply?
Yes, especially for higher volume, because the MAX98357A can draw up to high currents when playing loud audio, and the ESP32 regulator may not handle it reliably.
External 5V ----> MAX98357A ESP32_GND ------> Shared GND
This improves stability and reduces distortion.
Q: How can I increase volume?
Volume depends on three main factors: supply voltage, amplifier gain, and speaker type. To increase volume:
- Use 5V instead of 3.3V
- Use a lower impedance speaker (4Ω instead of 8Ω)
A 4Ω speaker draws more power and produces higher volume, but also increases load on the amplifier.
Q: What is the GAIN pin and how does it affect sound?
The GAIN pin sets the amplifier gain level.
GAIN -- GND (lower gain) GAIN -- VCC (higher gain) GAIN -- FLOAT (default)
Higher gain increases volume but can also increase noise and distortion, so it is often better to start with lower gain and adjust in software.
Q: Why do I hear noise or hiss even when no audio is playing?
Noise can come from several sources such as power supply ripple, bad grounding, or interference from the ESP32. Common fixes:
- Add decoupling capacitors
- Use a clean external power supply
- Keep wires short
Q: What wiring practices improve sound quality?
Good wiring is very important, especially for digital audio and switching amplifiers. Keep wires short:
ESP32 --> MAX98357A (short lines)
and use star grounding:
ESP32_GND ----+---- AMP_GND | POWER_GND
Avoid running speaker wires near signal wires, because the high current switching signals can introduce noise.
Q: What audio formats and features are supported?
The MAX98357A supports typical digital audio formats used with ESP32.
- 16-bit, 24-bit, or 32-bit audio
- Sample rates from 8 kHz to 96 kHz
Stefan is a professional software developer and researcher. He has worked in robotics, bioinformatics, image/audio processing and education at Siemens, IBM and Google. He specializes in AI and machine learning and has a keen interest in DIY projects involving Arduino and 3D printing.

