Skip to Content

Voice Assistant on UNIHIKER M10 with OpenAI

Voice Assistant on UNIHIKER M10 with OpenAI

The UNIHIKER M10 is a compact single-board that integrates a Linux-based system, sensors, and hardware interfaces on one board. It contains a quad-core Arm Cortex-A35 processor with 512 MB RAM and 16 GB eMMC storage.

The board also comes with a 2.8-inch touchscreen with 240 × 320 pixel resolution, Wi-Fi, and Bluetooth connectivity. Built-in sensors include a light sensor, accelerometer, gyroscope, and microphone.

Because the board runs Linux and supports Python programming, it can be used for many applications. In this tutorial you will learn how to implement a Voice Assistant on the UNIHIKER M10 board. The following short video clip demonstrates the Voice Assistant:

You may need to increase the volume on your computer to hear my voice. Also note there is no voice output from this assistant, though it would be easy to add.

Required Parts

You will need a UNIHIKER M10 board. You can get it at DFRobot or at Amazon, for instance:

Makerguides is a participant in affiliate advertising programs designed to provide a means for sites to earn advertising fees by linking to Amazon, AliExpress, Elecrow, and other sites. As an Affiliate we may earn from qualifying purchases.

Hardware of the Unihiker M10

The UNIHIKER M10 is built around a Rockchip RK3308 chip. This chip uses four Arm Cortex-A35 cores and operates at a clock frequency of up to 1.2 GHz. The processor runs a full Debian Linux operating system with pre-installed Python 3.7. Also pre-installed are many scientific Python libraries such as Numpy, Matplotlib, Jupyter, Pandas, Seaborn, Tensorflow and others.

The board includes 512 MB of DDR3 system memory. This memory is used by the Linux operating system and user applications. Program storage is provided by 16 GB of onboard eMMC flash memory.

In addition to the main processor, the board integrates a GD32VF103C8T6 microcontroller as a co-processor. It contains 64 KB of flash memory and 32 KB of SRAM. The microcontroller handles many hardware-level tasks such as sensor management, GPIO control, and actuator operation.

Display and User Interface

The UNIHIKER M10 comes with a 2.8-inch color touchscreen display. The display has a resolution of 240 × 320 pixels. The board includes a Home button and two programmable user buttons labeled A and B, as shown below:

Buttons on UNIHIKER M10
Buttons on UNIHIKER M10

Integrated Sensors

Built-in sensors include a light sensor, accelerometer, gyroscope, and microphone. Hardware expansion is available through USB ports, Gravity sensor connectors, and a micro:bit-compatible edge connector that exposes GPIO, I2C, SPI, and UART interfaces.

Front and back of UNIHIKER M10
Front and back of UNIHIKER M10 (source)

Furthermore, a passive buzzer and a status LED provide simple audio and visual output for notifications and debugging. An external speaker can be connected via the USB port.

Connectivity and Networking

Wireless communication is available through a combined Wi-Fi and Bluetooth module. The board supports 2.4 GHz Wi-Fi and Bluetooth 4.0 connectivity. Wi-Fi can be configured via a Web interface running at 10.1.2.3.

Interfaces and Expansion Options

The board provides several physical interfaces for connecting external hardware. A USB Type-C connector is used for power supply and communication with a computer. The board can be powered with a 5 V supply through this port.

A USB Type-A host port allows the connection of USB peripherals such as storage devices, keyboards, or adapters. The board also includes a microSD card slot for expanding storage capacity or transferring files.

Furthermore, multiple expansion connectors are available for sensors and actuators. Gravity-compatible 3-pin connectors provide access to PWM and analog inputs. Dedicated 4-pin connectors provide I2C communication for digital sensors and modules.

IO Connectors of UNIHIKER M10
IO Connectors of UNIHIKER M10 (source)

Edge Connector

The board also features a micro:bit-compatible edge connector. This connector exposes up to 19 GPIO pins and supports interfaces such as I2C, UART, and SPI. It also provides several ADC inputs and PWM outputs for controlling external hardware. The picture below shows the pinout of the edge connector:

Pinout of Edge Connector
Pinout of Edge Connector (source)

Schematics

For more detailed technical information see the Schematics of the UNIHIKER M10 linked below:

Technical Specification

The following table summarizes the Technical Specification of the UNIHIKER M10 board:

FeatureSpecification
Main ProcessorRockchip RK3308
CPU ArchitectureQuad-core Arm Cortex-A35
CPU FrequencyUp to 1.2 GHz
System Memory512 MB DDR3
Internal Storage16 GB eMMC
Co-ProcessorGD32VF103C8T6 RISC-V microcontroller
MCU Clock SpeedUp to 108 MHz
MCU Memory64 KB Flash, 32 KB SRAM
Operating SystemDebian Linux
Display2.8-inch touchscreen
Display Resolution240 × 320 pixels
Wireless ConnectivityWi-Fi 802.11 b/g/n (2.4 GHz), Bluetooth 4.0
SensorsLight sensor, 6-axis IMU (accelerometer + gyroscope), microphone
User InputsHome button, A/B user buttons, touchscreen
Audio OutputPassive buzzer
USB Ports1 × USB-C (power and data), 1 × USB-A host
Storage ExpansionmicroSD card slot
Expansion InterfacesGravity 3-pin connectors (analog/PWM), I2C connectors
Edge Connectormicro:bit-compatible edge connector
Supported InterfacesGPIO, ADC, PWM, I2C, SPI, UART
Operating Voltage5 V input via USB-C
Logic Level3.3 V
Typical CurrentUp to ~2 A
Board DimensionsApproximately 51.6 mm × 83 mm × 13 mm

Programming the UNIHIKER M10

There are various applications you can choose to program the UNIHIKER M10, such as Jupyter Notebook, Mind+, VSCode, Python IDLE or Thonny. For detailed instructions on how to set up these applications see the Getting Started section of the UNIHIKER documentation.

Installing Thonny

I personally found Thonny the easiest to use for a small project like in this tutorial. For instructions on how to install it, see the Download and Install Thonny section of the UNIHIKER documentation.

Connecting Thonny to UNIHIKER M10

After the installation of Thonny connect your UNIHIKER M10 to your computer with the USB cable. Then open Thonny and click on “Run -> Select interpreter …”:

This will open a dialog box where you pick “Remote Python 3 (SSH)” as interpreter. Under Host enter “10.1.2.3”, for the Username enter “root” and the Authentication method is “password”:

After clicking on “Run -> Stop/Restart backend” Thonny will be connected to you UNIHIKER M10 and you can edit and run Python programs on it.

Installing Putty and the OpenAI library

Next we will need a tool that allows us to install Python libraries, such as the OpenAI library on the UNIHIKER M10. I tried to install libraries via Thonny (“Manage Packages”, “Open System Shell…”) but couldn’t get it to work.

Instead I used Putty. For installation instructions see the SSH Tools of the UNIHIKER documentation. Once you have installed it, open Putty and create a Session with 10.1.2.3 as the HOST:

Then click the “Open” button and in the Putty shell enter “root” as username and “dfrobot” as password:

Now you can install the OpenAI library via the command “pip install openai”:

Should there be other Python libraries missing then you can install them in the same way. Note that the UNIHIKER M10 must be connect via the USB cable, however.

Setting up Wi-Fi

Finally, we also need to setup the Wi-Fi connection, so that our Voice Assistant has internet access to call the AI tools at OpenAI.

For that, open a webbrowser and enter “10.1.2.3” in the address bar. You will see a webpage with an item “Network Settings” in the sidebar. Click on it and enter your Wi-Fi credentials in the dialog on the right:

Getting OpenAI API Key

Our Voice Assistant is going to use AI models provided by OpenAI. You therefore will need an OpenAI account. Go to https://platform.openai.com and sign up with an email address or an existing Google or Microsoft account.

After verifying your email and completing the initial setup, log in to the OpenAI dashboard, platform.openai.com/api-keys and find or create your API Key (=SECRET KEY) as shown below:

OpenAI API keys
OpenAI API keys

The API Key is a unique, long string, starting with “sk-proj-” that is needed to authenticate your API requests (see below). Later you will need to copy this entire string into the code for the Voice Assistant.

sk-proj-xcA.......................OtDu0U

That is all you really need but I recommend you set a usage limit for your account as well. This ensures that you don’t accidentally end up with an expensive bill due to a bug in your code (e.g. sending hundreds of requests).

You can set Usage Limits and also find out the Pricing (Cost) for the different AI models under the Billing tab (platform.openai.com/settings/organization/billing).

Now we are ready to write the code for the Voice Assistant.

Code for Voice Assistant

In Thonny create a new file, I called mine “assistant_tk.py” and copy and paste the code below.

This code implements a voice assistant. It records audio when a button A is held, transcribes the speech to text using OpenAI’s Whisper model, sends the text to a GPT-based Large Language Model (LLM) for a response, and displays both the question and answer on the device’s GUI.

# www.makerguides.com
# Python 3.7
# openai 1.39.0
# PyAudio 0.2.11
# pinpong 0.6.1
# numpy 1.21.6

import time
import threading
import numpy as np
import pyaudio
import io
import wave
from openai import OpenAI
from pinpong.extension.unihiker import button_a
import tkinter as tk
from tkinter import ttk
from typing import List

API_KEY = "sk-proj-my-api-key"
DEVICE_INDEX = 2
SAMPLE_RATE = 16000
CHANNELS = 1
CHUNK = 1024

client = OpenAI(api_key=API_KEY)

# GUI -----------------------------------------------------

root = tk.Tk()
root.title("Voice Chatbot")
root.geometry("240x320")

status_var = tk.StringVar()
status_var.set("Hold Button A to speak")

status_label = ttk.Label(root, textvariable=status_var,
    font=("Arial", 11), anchor="center")
status_label.pack(fill="x", padx=6, pady=4)

frame = tk.Frame(root)
frame.pack(fill="both", expand=True)

scrollbar = tk.Scrollbar(frame, width=18)
scrollbar.pack(side="right", fill="y")

text_box = tk.Text(frame, wrap="word",
    yscrollcommand=scrollbar.set, font=("Arial", 10))
text_box.bind("<Motion>", lambda e: "break")
text_box.bind("<B1-Motion>", lambda e: "break")
text_box.bind("<Button-1>", lambda e: "break")
text_box.pack(side="left", fill="both", expand=True)

scrollbar.config(command=text_box.yview)


def set_status(text):
    status_var.set(text)
    root.update_idletasks()


def show_answer(text):
    text_box.delete("1.0", tk.END)
    text_box.insert(tk.END, text)
    text_box.see(tk.END)


# Audio helpers -----------------------------------------------

def normalize(pcm_bytes: bytes) -> bytes:
    samples = np.frombuffer(pcm_bytes, dtype=np.int16).astype(np.float32)
    peak = np.max(np.abs(samples))
    if peak == 0:
        return pcm_bytes
    gain = (0.9 * 32767) / peak
    normalized = np.clip(samples * gain, -32768, 32767).astype(np.int16)
    return normalized.tobytes()


def record_while_held() -> bytes:
    frames: List[bytes] = []

    pa = pyaudio.PyAudio()
    stream = pa.open(
        format=pyaudio.paInt16,
        channels=CHANNELS,
        rate=SAMPLE_RATE,
        input=True,
        input_device_index=DEVICE_INDEX,
        frames_per_buffer=CHUNK
    )

    set_status("Hold Button A to speak")

    while not button_a.is_pressed():
        time.sleep(0.01)

    set_status("recording...")

    while button_a.is_pressed():
        data = stream.read(CHUNK, exception_on_overflow=False)
        frames.append(data)

    stream.stop_stream()
    stream.close()
    pa.terminate()

    return b"".join(frames)


def pcm_to_wav(pcm_bytes: bytes) -> bytes:
    buf = io.BytesIO()
    with wave.open(buf, "wb") as wf:
        wf.setnchannels(CHANNELS)
        wf.setsampwidth(2)
        wf.setframerate(SAMPLE_RATE)
        wf.writeframes(pcm_bytes)
    return buf.getvalue()


# OpenAI -----------------------------------------------

def transcribe(wav_bytes: bytes) -> str:
    audio_file = io.BytesIO(wav_bytes)
    audio_file.name = "recording.wav"

    set_status("transcribing...")
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        language="en",
        temperature=0
    )
    return response.text


def ask_gpt(question: str) -> str:
    set_status("thinking...")
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": "You are a helpful assistant. Answer clearly and concisely."
            },
            {"role": "user", "content": question}
        ],
    )
    return response.choices[0].message.content


# Main assistant loop -----------------------------------------------

def assistant_loop():
    while True:
        raw_pcm = record_while_held()
        if not raw_pcm:
            continue

        normalized_pcm = normalize(raw_pcm)
        wav_bytes = pcm_to_wav(normalized_pcm)
        question = transcribe(wav_bytes)
        answer = ask_gpt(question)

        set_status("ready")
        show_answer(question + "\n\n" + answer)


threading.Thread(target=assistant_loop, daemon=True).start()
root.mainloop()

Imports

The code starts by importing several modules necessary for its functionality. These include standard libraries like time, threading, and io for timing, concurrent execution, and byte stream handling. It also imports numpy for numerical operations on audio data, pyaudio for audio recording, and wave for handling WAV audio format.

The OpenAI Python client is imported to interact with OpenAI’s API. The UNIHIKER-specific button_a is imported to detect button presses. Finally, tkinter and ttk are imported to create the GUI, and List from typing is used for type hinting.

import time
import threading
import numpy as np
import pyaudio
import io
import wave
from openai import OpenAI
from pinpong.extension.unihiker import button_a
import tkinter as tk
from tkinter import ttk
from typing import List

Constants and Client Initialization

Several constants are defined to configure the audio recording parameters, such as the API key for OpenAI, the audio input device index, sample rate, number of channels, and chunk size for audio buffering. An OpenAI client object is then created using the provided API key, enabling communication with OpenAI’s services.

API_KEY = "sk-proj-my-api-key"
DEVICE_INDEX = 2
SAMPLE_RATE = 16000
CHANNELS = 1
CHUNK = 1024

client = OpenAI(api_key=API_KEY)

Note that you have to replace the dummy value “sk-proj-my-api-key” for the API_KEY by your OpenAI api key!

Graphical User Interface (GUI)

The GUI is built using tkinter. A main window is created with a title and fixed size. A status label is added to inform the user about the current state, such as prompting to hold Button A to speak or showing progress messages.

Below the status label, a text box with a scrollbar is provided to display the transcribed question and the assistant’s answer. The text box is set to be read-only by disabling mouse interactions that would modify its content.

root = tk.Tk()
root.title("Voice Chatbot")
root.geometry("240x320")

status_var = tk.StringVar()
status_var.set("Hold Button A to speak")

status_label = ttk.Label(root, textvariable=status_var,
    font=("Arial", 11), anchor="center")
status_label.pack(fill="x", padx=6, pady=4)

frame = tk.Frame(root)
frame.pack(fill="both", expand=True)

scrollbar = tk.Scrollbar(frame, width=18)
scrollbar.pack(side="right", fill="y")

text_box = tk.Text(frame, wrap="word",
    yscrollcommand=scrollbar.set, font=("Arial", 10))
text_box.bind("<Motion>", lambda e: "break")
text_box.bind("<B1-Motion>", lambda e: "break")
text_box.bind("<Button-1>", lambda e: "break")
text_box.pack(side="left", fill="both", expand=True)

scrollbar.config(command=text_box.yview)

Status and Display Functions

Two helper functions manage the GUI updates. The set_status() function updates the status label text and forces the GUI to refresh immediately. The show_answer() function clears the text box and inserts new text, then scrolls to the end to ensure the latest content is visible.

def set_status(text):
    status_var.set(text)
    root.update_idletasks()


def show_answer(text):
    text_box.delete("1.0", tk.END)
    text_box.insert(tk.END, text)
    text_box.see(tk.END)

Audio Helpers

The code includes several functions to handle audio processing. The normalize() function takes raw PCM audio bytes, converts them to floating-point samples, and scales them so that the peak amplitude reaches 90% of the maximum 16-bit integer range. This normalization ensures consistent audio volume for transcription.

def normalize(pcm_bytes: bytes) -> bytes:
    samples = np.frombuffer(pcm_bytes, dtype=np.int16).astype(np.float32)
    peak = np.max(np.abs(samples))
    if peak == 0:
        return pcm_bytes
    gain = (0.9 * 32767) / peak
    normalized = np.clip(samples * gain, -32768, 32767).astype(np.int16)
    return normalized.tobytes()

The record_while_held() function records audio from the specified input device while Button A is pressed.

It initializes a PyAudio stream with the configured parameters and waits until Button A is pressed. Then, it continuously reads audio chunks and appends them to a list until the button is released. The recorded audio frames are concatenated and returned as raw PCM bytes.

def record_while_held() -> bytes:
    frames: List[bytes] = []

    pa = pyaudio.PyAudio()
    stream = pa.open(
        format=pyaudio.paInt16,
        channels=CHANNELS,
        rate=SAMPLE_RATE,
        input=True,
        input_device_index=DEVICE_INDEX,
        frames_per_buffer=CHUNK
    )

    set_status("Hold Button A to speak")
    while not button_a.is_pressed():
        time.sleep(0.01)

    set_status("recording...")
    while button_a.is_pressed():
        data = stream.read(CHUNK, exception_on_overflow=False)
        frames.append(data)

    stream.stop_stream()
    stream.close()
    pa.terminate()

    return b"".join(frames)

The pcm_to_wav() function converts the raw PCM bytes into a WAV format byte stream using the wave module. This is necessary because the OpenAI transcription API expects audio files in standard formats like WAV.

def pcm_to_wav(pcm_bytes: bytes) -> bytes:
    buf = io.BytesIO()
    with wave.open(buf, "wb") as wf:
        wf.setnchannels(CHANNELS)
        wf.setsampwidth(2)
        wf.setframerate(SAMPLE_RATE)
        wf.writeframes(pcm_bytes)
    return buf.getvalue()

OpenAI API Interaction

Two functions handle communication with OpenAI’s API. The transcribe() function sends the WAV audio bytes to the Whisper model to obtain a text transcription. It updates the status label to indicate transcription is in progress and returns the recognized text.

def transcribe(wav_bytes: bytes) -> str:
    audio_file = io.BytesIO(wav_bytes)
    audio_file.name = "recording.wav"

    set_status("transcribing...")

    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        language="en",
        temperature=0
    )

    return response.text

The ask_gpt() function sends the transcribed question to a GPT-based chat completion model. It sets the status to “thinking…” while waiting for the response. Next, the system message is constructed that instructs the model to be a helpful assistant. Finally, the function returns the assistant’s reply text.

def ask_gpt(question: str) -> str:
    set_status("thinking...")
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": "You are a helpful assistant. Answer clearly and concisely."
            },
            {"role": "user", "content": question}
        ],
    )

    return response.choices[0].message.content

Main Assistant Loop

The core functionality runs inside the assistant_loop() function, which runs in a separate thread to keep the GUI responsive.

It continuously waits for the user to hold Button A and records audio while the button is pressed. If no audio is captured, it restarts the loop. Otherwise, it normalizes the audio, converts it to WAV format, transcribes the speech, and queries the GPT model for an answer.

After processing, it updates the status to “ready” and displays both the question and the answer in the text box.

def assistant_loop():
    while True:
        raw_pcm = record_while_held()
        if not raw_pcm:
            continue

        normalized_pcm = normalize(raw_pcm)
        wav_bytes = pcm_to_wav(normalized_pcm)

        question = transcribe(wav_bytes)
        answer = ask_gpt(question)

        set_status("ready")
        show_answer(question + "\n\n" + answer)

Starting the Assistant and GUI Mainloop

Finally, the assistant loop is started in a daemon thread, allowing it to run in the background. The GUI’s main event loop is then started with root.mainloop(), which keeps the window open and responsive to user interactions.

threading.Thread(target=assistant_loop, daemon=True).start()
root.mainloop()

Error messages

If you run the code you may see the following warnings on the Thonny console:

ALSA lib pcm.c:2565:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2565:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2565:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm.c:2565:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958
ALSA lib pcm.c:2565:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958
ALSA lib pcm.c:2565:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958
ALSA lib pcm.c:2565:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
ALSA lib pcm.c:2565:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
ALSA lib pcm.c:2565:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
ALSA lib pcm.c:2565:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
ALSA lib pcm.c:2565:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
ALSA lib pcm.c:2565:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_a52.c:823:(_snd_pcm_a52_open) a52 is only for playback
ALSA lib conf.c:5014:(snd_config_expand) Unknown parameters {AES0 0x6 AES1 0x82 AES2 0x0 AES3 0x2  CARD 0}
ALSA lib pcm.c:2565:(snd_pcm_open_noupdate) Unknown PCM iec958:{AES0 0x6 AES1 0x82 AES2 0x0 AES3 0x2  CARD 0}
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card

This is because when you initialize pyaudio.PyAudio(), the underlying ALSA (Advanced Linux Sound Architecture) library scans your system for all possible audio devices and configurations (like 5.1 surround sound, HDMI, or digital S/PDIF).

Since the UNIHIKER M10 is a compact Single Board Computer, it doesn’t have “rear speakers” or “modem ports,” so ALSA complains that it can’t find them. If your code actually runs and handles audio after these warnings, your setup is working fine.

Conclusions

In this tutorial you learned how to implement a Voice Assistant on the UNIHIKER M10 using OpenAI services. For other, simpler code examples have a look at the Python Coding Examples of the UNIHIKER documentation.

Note that you easily could extend and improve the Voice Assistant by adding a chat history and tool calling functionality. Also, I used Tkinter to create a simple user interface but for a better GUI you could use QtPy, which is preinstalled on the UNIHIKER M10 as well.

Furthermore, you could add a speaker, either via USB, Bluetooth or line-out, to let your Voice Assistant respond with audio. See the AI Language Tutor with UNIHIKER M10 tutorial for an example.

For a Vision Chatbot that can see and answer questions about images see our Vision Chatbot with DFRobot ESP32-S3 AI Camera and OpenAI tutorial. And for more AI examples, see the AI Projects section of the UNIHIKER docs.

If you have any questions feel free to leave them in the comment section.

Happy Tinkering 😉