Skip to Content

AI Language Tutor with UNIHIKER M10

AI Language Tutor with UNIHIKER M10

In this tutorial you will learn how to build your own AI Language Tutor on a UNIHIKER M10 using OpenAI’s AI models. You will need an OpenAI account and an API key for that.

The UNIHIKER M10 is a small board with 512 MB RAM and 16 GB eMMC storage, and a 2.8-inch touchscreen. Apart from Wi-Fi, and Bluetooth connectivity the board includes built-in sensors such as a light sensor, accelerometer, gyroscope, and most importantly a microphone.

It runs Debian Linux, supports Python programming and comes with many Python libraries pre-installed. This makes it easy to implement AI solutions such as an AI Language Tutor. The following short video clip demonstrates the AI Language Tutor we are going to build.

You may need to increase the volume on your computer to hear my voice and the response of the AI.

Required Parts

For this tutorial you need a UNIHIKER M10 board. You can get it at DFRobot or at Amazon. If you have a USB Speaker you can use that one and connect it to the UNIHIKER M10. But if you want to add your own small speakers you will need a PAM8403 amplifier and one or two of the speakers listed below.

PAM8403 Amplifier

2 x Speaker 3 Watt 4 Ohm

Makerguides is a participant in affiliate advertising programs designed to provide a means for sites to earn advertising fees by linking to Amazon, AliExpress, Elecrow, and other sites. As an Affiliate we may earn from qualifying purchases.

Hardware of the UNIHIKER M10

The UNIHIKER M10 is a compact single-board computer for education, prototyping, and AIoT applications. It comes with a Linux-based system, sensors, and hardware interfaces on one board.

The device is based on a quad-core Arm Cortex-A35 processor running up to 1.2 GHz. It includes 512 MB RAM and 16 GB eMMC storage and runs the Debian Linux operating system.

Front and back of UNIHIKER M10
Front and back of UNIHIKER M10 (source)

The board integrates a 2.8-inch touchscreen with 240 × 320 pixel resolution, Wi-Fi, and Bluetooth connectivity. Built-in sensors include a light sensor, accelerometer, gyroscope, and microphone. Hardware expansion is available through USB ports, Gravity sensor connectors, and a micro:bit-compatible edge connector that exposes GPIO, I2C, SPI, and UART interfaces.

For more technical details see our Voice Assistant on UNIHIKER M10 with OpenAI tutorial. This tutorial will also tell you how to program the UNHIKER, how to install Python libraries and how to configure the Wi-Fi. All of that you will need to run the AI tutor we will build in this tutorial.

Connecting Loudspeakers to UNIHIKER M10

Our AI Tutor will have the capability to generate speak. For that we will need speakers. You can connect speakers to the UNHIKER M10 via USB or Bluetooth. This is the easiest solution and if you opt for it, then you can skip this section.

I wanted to use smaller speakers to make the my AI Tutor portable. The UNHIKER M10 has a lineout output but unfortunately no connector for it. You need to solder wires to specific pads on the back of the board and you also need a small amplifier to drive the speakers. I’ve got the wiring from the AI Assistant with OpenAI GPT, Azure Speech API and UNIHIKER post.

PAM8403 Amplifier

I used a PAM8403 module as amplifier. The PAM8403 module is a small, Class-D, stereo audio amplifier based on the PAM8403 chip. It accepts a supply voltage from 2.5 V up to 5.5 V. At 5V, and when driving 4 Ω speakers, each channel can produce up to about 3 W of output power. The following picture shows the pinout of the PAM8403 amplifier module:

Pinout of PAM8403 Amplifier
Pinout of PAM8403 Amplifier

For more information about the PAM8403 amplifier module see our Audio with PAM8403, PCM5102 and ESP32 tutorial.

Lineout Output of UNIHIKER M10

As mentioned, there is no connector for the lineout output on the UNIHIKER M10 but you can access it via (test) pads on the back of the board (schematics). The picture below shows the location of the Lineout and power supply pads that we need to connect the PAM8403 amplifier to:

Lineout and power supply pads on UNIHIKER M10
Lineout and power supply pads on UNIHIKER M10

I measured 4.7 V on the VCC pad, which is a strange, since I expected 3.3V or 5V but the PAM8403 works fine with 4.7 V. Also my board had only one pad for VCC, while the photo above shows two pads. But everything worked fine, nevertheless.

Connecting Amplifier and Loudspeakers to UNIHIKER M10

In this section I will show you how to connect the PAM8403 amplifier the loudspeakers to the UNIHIKER M10. The small 3W speakers often come with plugs you will need to cut off, since we solder the speaker wires directly to the PAM8403 module:

The following picture shows you how to wire the UNIHIKER M10 to the PAM8403 amplifier and the speakers:

Soldering the wires to the pads of the UNIHIKER M10 is easy but make sure you connect to the correct pads and don’t damage anything in the process. The photo below shows my completed wiring:

Note that you don’t need two speakers, since stereo sound is not really required. But two speakers will be louder than one speakers. If you want really loud sound use a USB speaker with an external power supply.

User Manual for AI Language Tutor

In this section I will quickly explain how the user interface for the AI Language Tutor works. This will make it easier for you to understand the code in the next section and to use the Tutor software. The picture below shows the GUI with the functional elements annotated;:

You press the A button and hold it while speaking. When you release the A button the recorded audio is sent to OpenAI for transcription and translation. The transcription and translation are then displayed in the Answer field and the translation is voiced out. You can replay the translation audio by pressing the B button.

You can select the target language for the translation by pressing the “Select Language” button at the top. It currently cycles through “Japanese” , “Italien” and “German” as languages. But you can extend this easily to other or more languages in the code.

Note, while the user interface is in English you can actually speak in any language supported by the TTS model at OpenAI (link). The system will understand the language (if it is supported) and will translate into the selected target language. You can even speak in the target language to check your pronunciation. See the following two video clips for a demo, where I speak German and Japanese:

When you press the “Explain” button at the bottom, an explanation of the grammar of the translation will be printed out. Similarly, if you press the “Example” button, example sentences with a similar grammar will be added. The following two video clips demonstrate the functionality:

Finally, there are buttons to increase and decrease the volume and font size at the top of the screen and scroll buttons at the bottom of the screen.

While you can use your fingers to control the UI, using the little pen that comes with the UNIHIKER M10 works better. Note that you can even touch the answer field and drag/scroll it instead of using the scroll buttons or the the scroll bar.

Getting OpenAI API Key

Our AI Language Tutor is going to use AI models provided by OpenAI. You therefore will need an OpenAI account and API key. Go to https://platform.openai.com and sign up with an email address or an existing Google or Microsoft account.

After verifying your email and completing the initial setup, log in to the OpenAI dashboard, platform.openai.com/api-keys and find or create your API Key (=SECRET KEY) as shown below:

OpenAI API keys
OpenAI API keys

The API Key is a unique, long string, starting with “sk-proj-” that is needed to authenticate your API requests. Later you will need to copy this entire string into the code for the AI Language Tutor.

Code for AI Language Tutor

The following code implements our AI-powered language tutor application. It allows users to speak phrases in any language, which are then transcribed, translated into a selected target language, and played back using text-to-speech (TTS). The app also offers grammar explanations and example sentences to aid language learning.

# www.makerguides.com
# Python 3.7
# openai 1.39.0
# PyAudio 0.2.11
# pinpong 0.6.1
# numpy 1.21.6

import sys
import time
import os
import threading
import tempfile
import numpy as np
import pyaudio
import io
import wave
import pygame

from openai import OpenAI
from pinpong.extension.unihiker import button_a, button_b

from qtpy.QtWidgets import (
    QApplication,
    QWidget,
    QVBoxLayout,
    QHBoxLayout,
    QLabel,
    QTextEdit,
    QScroller,
    QPushButton,
)
from qtpy.QtCore import Qt, QObject, Signal, QThread, Slot


API_KEY = "sk-proj-my-api_key"  # SET YOUR API KEY HERE!
DEVICE_INDEX = 2
SAMPLE_RATE  = 16000
CHANNELS     = 1
CHUNK        = 1024

LANGUAGES       = ["Japanese", "Italian", "German"]
DEFAULT_STATUS  = "Hold A: new  |  B: replay"
VOLUME_STEP     = 0.1      # increment per button press
INITIAL_VOLUME  = 0.8      # volume level when not muted
_volume = INITIAL_VOLUME   # float, 0.0 – 1.0

TTS_FILE = os.path.join(tempfile.gettempdir(), "tts_translation.mp3")

client = OpenAI(api_key=API_KEY)

pygame.mixer.init()


# STYLE ----------------------------------------------------------

ORA_BG      = "#fcdb03"   # bright yellow
ORA_DARK    = "#d4a800"   # darker yellow (pressed / language btn)
ORA_DIM     = "#fef08a"   # disabled button bg
ORA_TEXT    = "#000000"   # black text everywhere
ORA_DT      = "#888800"   # disabled button text

def _btn_style(bg=ORA_BG, fg=ORA_TEXT, en=ORA_BG,
               pr=ORA_DARK, ds=ORA_DIM, dt=ORA_DT):
    return (
        f"QPushButton          {{ background: {bg}; color: {fg};"
        f"                        border: none; font-size: 13px; }}"
        f"QPushButton:enabled  {{ background: {en}; }}"
        f"QPushButton:pressed  {{ background: {pr}; }}"
        f"QPushButton:disabled {{ background: {ds}; color: {dt}; }}"
    )



# WORKER THREAD ----------------------------------------------------

class AssistantWorker(QObject):

    status          = Signal(str)
    answer          = Signal(str)
    btn_explain_on  = Signal(bool)
    btn_examples_on = Signal(bool)
    language_changed = Signal(str)   

    def __init__(self):
        super().__init__()
        self._last_question    = ""
        self._last_translation = ""
        self._last_grammar     = ""
        self._last_examples    = ""
        self._tts_ready        = False

        self._lang_index = 0  
        self._language   = LANGUAGES[0]

        # Thread-safe flags
        self._explain_requested  = False
        self._examples_requested = False
        self._lang_requested     = False
        self._lock = threading.Lock()

        self._pa     = pyaudio.PyAudio()
        self._stream = self._pa.open(
            format=pyaudio.paInt16,
            channels=CHANNELS,
            rate=SAMPLE_RATE,
            input=True,
            input_device_index=DEVICE_INDEX,
            frames_per_buffer=CHUNK,
        )

    @Slot()
    def on_explain_clicked(self):
        with self._lock:
            self._explain_requested = True

    @Slot()
    def on_examples_clicked(self):
        with self._lock:
            self._examples_requested = True

    @Slot()
    def on_language_clicked(self):
        with self._lock:
            self._lang_requested = True

    def run(self):
        self.status.emit("Hold Button A to speak")

        while True:
            a_pressed = button_a.is_pressed()
            b_pressed = button_b.is_pressed()

            with self._lock:
                explain_req  = self._explain_requested
                examples_req = self._examples_requested
                lang_req     = self._lang_requested
                self._explain_requested  = False
                self._examples_requested = False
                self._lang_requested     = False

            if lang_req and not a_pressed:
                self._lang_index = (self._lang_index + 1) % len(LANGUAGES)
                self._language   = LANGUAGES[self._lang_index]
                self.language_changed.emit(self._language)

            elif a_pressed:
                self._clear_state()
                raw_pcm = record_while_held(self, self._stream)
                if not raw_pcm:
                    self.status.emit("Hold Button A to speak")
                    continue

                self.status.emit("Transcribing...")
                normalized = normalize(raw_pcm)
                wav_bytes  = pcm_to_wav(normalized)
                question   = transcribe(wav_bytes)

                self.status.emit(f"Translating to {self._language}...")
                translation = translate_only(question, self._language)

                self._last_question    = question
                self._last_translation = translation

                self._refresh_display()
                self.btn_explain_on.emit(True)
                self.btn_examples_on.emit(True)

                self.status.emit("Generating audio...")
                tts_text = extract_first_line(translation)
                generate_tts(tts_text, TTS_FILE)
                self._tts_ready = True
                self._set_default_status()
                play_audio(TTS_FILE)

            elif b_pressed:
                while button_b.is_pressed():
                    time.sleep(0.01)
                if self._tts_ready:
                    self.status.emit("Replaying...")
                    play_audio(TTS_FILE)
                    self._set_default_status()

            elif explain_req and self._last_translation:
                self.status.emit("Explaining grammar...")
                self._last_grammar = explain_grammar(
                    self._last_question, self._last_translation, self._language
                )
                self._refresh_display()
                self._set_default_status()

            elif examples_req and self._last_translation:
                self.status.emit("Generating examples...")
                self._last_examples = add_examples(
                    self._last_question, self._last_translation, self._language
                )
                self._refresh_display()
                self._set_default_status()

            else:
                time.sleep(0.02)

    def _set_default_status(self):
        self.status.emit(DEFAULT_STATUS)

    def _clear_state(self):
        self._last_question    = ""
        self._last_translation = ""
        self._last_grammar     = ""
        self._last_examples    = ""
        self._tts_ready        = False
        if pygame.mixer.music.get_busy():
            pygame.mixer.music.stop()
        if os.path.exists(TTS_FILE):
            try:
                os.remove(TTS_FILE)
            except OSError:
                pass
        self.answer.emit("")
        self.btn_explain_on.emit(False)
        self.btn_examples_on.emit(False)

    def _refresh_display(self):
        parts = []
        if self._last_question:
            parts.append(self._last_question)
        if self._last_translation:
            parts.append(self._last_translation)
        if self._last_grammar:
            parts.append(f"── Grammar ──\n{self._last_grammar}")
        if self._last_examples:
            parts.append(f"── Examples ──\n{self._last_examples}")
        self.answer.emit("\n\n".join(parts))



# GUI ----------------------------------------------------

class AssistantUI(QWidget):

    def __init__(self):
        super().__init__()
        self.setWindowTitle("Voice Chatbot")
        self.setFixedSize(240, 320)

        # Orange window background, black text; white text area
        self.setStyleSheet(
            f"QWidget  {{ background-color: {ORA_BG}; color: {ORA_TEXT}; }}"
            f"QTextEdit {{ background-color: #ffffff; color: {ORA_TEXT};"
            f"             border: none; }}"
        )

        layout = QVBoxLayout(self)
        layout.setContentsMargins(4, 4, 4, 4)
        layout.setSpacing(3)

        lang_row = QHBoxLayout()
        lang_row.setSpacing(4)

        _dark_style = _btn_style(bg=ORA_DARK, en=ORA_DARK, pr="#8a3d00")

        self.btn_font_minus = QPushButton("-")
        self.btn_font_minus.setFixedSize(28, 26)
        self.btn_font_minus.setStyleSheet(_dark_style)
        lang_row.addWidget(self.btn_font_minus)

        self.btn_vol_down = QPushButton("<")
        self.btn_vol_down.setFixedSize(28, 26)
        self.btn_vol_down.setStyleSheet(_dark_style)
        lang_row.addWidget(self.btn_vol_down)

        self.btn_language = QPushButton(LANGUAGES[0])
        self.btn_language.setFixedHeight(26)
        self.btn_language.setStyleSheet(_dark_style)
        lang_row.addWidget(self.btn_language, stretch=1)

        self.btn_vol_up = QPushButton(">")
        self.btn_vol_up.setFixedSize(28, 26)
        self.btn_vol_up.setStyleSheet(_dark_style)
        lang_row.addWidget(self.btn_vol_up)

        self.btn_font_plus = QPushButton("+")
        self.btn_font_plus.setFixedSize(28, 26)
        self.btn_font_plus.setStyleSheet(_dark_style)
        lang_row.addWidget(self.btn_font_plus)

        layout.addLayout(lang_row)

        self._font_size = 10
        self.btn_font_plus.clicked.connect(self._font_increase)
        self.btn_font_minus.clicked.connect(self._font_decrease)
        self.btn_vol_down.clicked.connect(self._vol_decrease)
        self.btn_vol_up.clicked.connect(self._vol_increase)

        self.status = QLabel("Starting...")
        self.status.setAlignment(Qt.AlignCenter)
        self.status.setStyleSheet(
            "QLabel { background-color: #ffffff; color: #000000; padding: 1px; }"
        )
        layout.addWidget(self.status)

        self.text = QTextEdit()
        self.text.setReadOnly(True)
        self.text.setTextInteractionFlags(Qt.NoTextInteraction)
        layout.addWidget(self.text, stretch=1)

        QScroller.grabGesture(
            self.text.viewport(),
            QScroller.LeftMouseButtonGesture
        )

        btn_row = QHBoxLayout()
        btn_row.setSpacing(4)

        _scroll_style = _btn_style(bg=ORA_DARK, en=ORA_DARK, pr="#8a3d00")

        self.btn_scroll_up = QPushButton("▲")
        self.btn_scroll_up.setFixedSize(40, 26)
        self.btn_scroll_up.setStyleSheet(_scroll_style)
        btn_row.addWidget(self.btn_scroll_up)

        self.btn_explain  = QPushButton("Explain")
        self.btn_examples = QPushButton("Examples")

        for btn in (self.btn_explain, self.btn_examples):
            btn.setEnabled(False)
            btn.setFixedHeight(26)
            btn.setStyleSheet(_btn_style())
            btn_row.addWidget(btn, stretch=1)

        self.btn_scroll_down = QPushButton("▼")
        self.btn_scroll_down.setFixedSize(40, 26)
        self.btn_scroll_down.setStyleSheet(_scroll_style)
        btn_row.addWidget(self.btn_scroll_down)

        layout.addLayout(btn_row)

        self.btn_scroll_down.clicked.connect(self._scroll_down)
        self.btn_scroll_up.clicked.connect(self._scroll_up)


    def update_status(self, txt):
        self.status.setText(txt)

    def update_answer(self, txt):
        self.text.clear()
        self.text.setPlainText(txt)
        self.text.verticalScrollBar().setValue(0)

    def set_explain_enabled(self, enabled: bool):
        self.btn_explain.setEnabled(enabled)

    def set_examples_enabled(self, enabled: bool):
        self.btn_examples.setEnabled(enabled)

    def update_language_label(self, lang: str):
        self.btn_language.setText(lang)

    def _scroll_down(self):
        sb = self.text.verticalScrollBar()
        sb.setValue(sb.value() + self.text.viewport().height())

    def _scroll_up(self):
        sb = self.text.verticalScrollBar()
        sb.setValue(sb.value() - self.text.viewport().height())

    def _font_increase(self):
        self._font_size = min(self._font_size + 1, 28)
        self._apply_font_size()

    def _font_decrease(self):
        self._font_size = max(self._font_size - 1, 6)
        self._apply_font_size()

    def _apply_font_size(self):
        font = self.text.font()
        font.setPointSize(self._font_size)
        self.text.setFont(font)


    def _vol_increase(self):
        global _volume
        _volume = min(round(_volume + VOLUME_STEP, 1), 1.0)

    def _vol_decrease(self):
        global _volume
        _volume = max(round(_volume - VOLUME_STEP, 1), 0.0)



# AUDIO HELPERS  ----------------------------------------------------

def normalize(pcm_bytes: bytes) -> bytes:
    samples = np.frombuffer(pcm_bytes, dtype=np.int16).astype(np.float32)
    peak = np.max(np.abs(samples))
    if peak == 0:
        return pcm_bytes
    gain = (0.9 * 32767) / peak
    return np.clip(samples * gain, -32768, 32767).astype(np.int16).tobytes()


def record_while_held(worker: AssistantWorker, stream) -> bytes:
    try:
        while stream.get_read_available() > 0:
            stream.read(stream.get_read_available(), exception_on_overflow=False)
    except Exception:
        pass

    worker.status.emit("Recording...")
    frames = []
    while button_a.is_pressed():
        frames.append(stream.read(CHUNK, exception_on_overflow=False))
    return b"".join(frames)


def pcm_to_wav(pcm_bytes: bytes) -> bytes:
    buf = io.BytesIO()
    with wave.open(buf, "wb") as wf:
        wf.setnchannels(CHANNELS)
        wf.setsampwidth(2)
        wf.setframerate(SAMPLE_RATE)
        wf.writeframes(pcm_bytes)
    return buf.getvalue()


def generate_tts(text: str, filepath: str) -> None:
    """Call OpenAI TTS and save to filepath (mp3)."""
    response = client.audio.speech.create(
        model="tts-1",
        voice="alloy",
        input=text,
    )
    with open(filepath, "wb") as f:
        f.write(response.content)


def play_audio(filepath: str) -> None:
    """Play audio at the current volume, then mute the output again."""
    pygame.mixer.music.set_volume(_volume)
    pygame.mixer.music.load(filepath)
    pygame.mixer.music.play()
    while pygame.mixer.music.get_busy():
        time.sleep(0.05)
    pygame.mixer.music.set_volume(0.0) 


# OPENAI HELPERS ------------------------------------------------------

def transcribe(wav_bytes: bytes) -> str:
    audio_file      = io.BytesIO(wav_bytes)
    audio_file.name = "recording.wav"
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        language="en",
        temperature=0,
    )
    return response.text


def _script_hint(language: str) -> str:
    """Return a script hint for the system prompt based on language."""
    hints = {
        "Japanese": "in Kanji/Kana on one line, then the romaji reading on the next line",
        "Italian":  "in Italian script",
        "German":   "in German script",
    }
    return hints.get(language, "in the target language's script")


def translate_only(question: str, language: str) -> str:
    hint = _script_hint(language)
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": (
                    f"Translate the given English phrase to {language}. "
                    f"Provide the translation {hint}. "
                    "Do NOT include grammar explanations or example sentences."
                ),
            },
            {"role": "user", "content": question},
        ],
    )
    return response.choices[0].message.content


def extract_first_line(translation: str) -> str:
    """Return the first line of the translation for TTS (skips romaji line)."""
    return translation.splitlines()[0].strip() if translation else translation


def explain_grammar(question: str, translation: str, language: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": (
                    f"You are a {language} language teacher. "
                    f"Explain the grammar of the {language} translation clearly and concisely. "
                    "Do NOT provide additional example sentences."
                ),
            },
            {
                "role": "user",
                "content": (
                    f"Original English: {question}\n"
                    f"{language} translation: {translation}\n\n"
                    "Please explain the grammar."
                ),
            },
        ],
    )
    return response.choices[0].message.content


def add_examples(question: str, translation: str, language: str) -> str:
    hint = _script_hint(language)
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": (
                    f"You are a {language} language teacher. "
                    f"Provide 2-3 natural example sentences in {language} ({hint}) "
                    "that illustrate the same grammar pattern or vocabulary. "
                    "Include a short English translation for each."
                ),
            },
            {
                "role": "user",
                "content": (
                    f"Original English: {question}\n"
                    f"{language} translation: {translation}\n\n"
                    "Please provide example sentences."
                ),
            },
        ],
    )
    return response.choices[0].message.content



# MAIN ------------------------------------------------------

def main():
    app = QApplication(sys.argv)

    ui     = AssistantUI()
    worker = AssistantWorker()
    thread = QThread()

    worker.moveToThread(thread)
    thread.started.connect(worker.run)
    worker.status.connect(ui.update_status)
    worker.answer.connect(ui.update_answer)
    worker.btn_explain_on.connect(ui.set_explain_enabled)
    worker.btn_examples_on.connect(ui.set_examples_enabled)
    worker.language_changed.connect(ui.update_language_label)

    ui.btn_explain.clicked.connect(worker.on_explain_clicked,  Qt.DirectConnection)
    ui.btn_examples.clicked.connect(worker.on_examples_clicked, Qt.DirectConnection)
    ui.btn_language.clicked.connect(worker.on_language_clicked, Qt.DirectConnection)

    ui.show()
    thread.start()

    sys.exit(app.exec())


if __name__ == "__main__":
    main()

Imports

The program begins by importing necessary Python modules and libraries. These include standard modules like sys, time, os, and threading for system interaction, timing, file handling, and concurrency. It uses numpy for numerical operations on audio data, pyaudio for audio recording, and pygame for audio playback.

The code also imports the OpenAI client library to access AI services, and specific hardware buttons button_a and button_b from the UNIHIKER extension. For the graphical user interface (GUI), it uses qtpy to create widgets and manage signals and slots.

Note that you will need to install the OpenAI libary, since it is not pre-installed on the UNIHIKER M10. If you need help with this see the Voice Assistant on UNIHIKER M10 with OpenAI tutorial.

Constants and Configuration

Several constants define the application’s behavior and appearance. These include the OpenAI API key, audio input device parameters such as sample rate and channels, and UI-related constants like supported languages and color codes for styling buttons and backgrounds.

Volume control parameters are also set, including the initial volume and the step size for volume adjustments.

API_KEY = "sk-proj-my-api_key"
DEVICE_INDEX = 2
SAMPLE_RATE  = 16000
CHANNELS     = 1
CHUNK        = 1024

LANGUAGES       = ["Japanese", "Italian", "German"]
DEFAULT_STATUS  = "Hold A: new  |  B: replay"
VOLUME_STEP     = 0.1
INITIAL_VOLUME  = 0.8
_volume = INITIAL_VOLUME

Remember that you have to replace the value for the API_KEY constant by your own API key!

Button Styling

The function _btn_style() returns a stylesheet string that defines the appearance of buttons in different states (normal, pressed, disabled). This function centralizes the styling to maintain a consistent look throughout the UI.

AssistantWorker Class

This class encapsulates the core logic running in a separate thread to keep the UI responsive. It manages audio recording, interaction with OpenAI APIs, and state management.

The class defines signals to communicate status updates, answers, button states, and language changes back to the UI.

In the constructor, it initializes variables to store the last question, translation, grammar explanation, and example sentences. It also sets up the audio input stream using pyaudio with the specified device and parameters.

class AssistantWorker(QObject):
    status          = Signal(str)
    answer          = Signal(str)
    btn_explain_on  = Signal(bool)
    btn_examples_on = Signal(bool)
    language_changed = Signal(str)

    def __init__(self):
        super().__init__()
        self._last_question    = ""
        self._last_translation = ""
        self._last_grammar     = ""
        self._last_examples    = ""
        self._tts_ready        = False

        self._lang_index = 0
        self._language   = LANGUAGES[0]

        self._explain_requested  = False
        self._examples_requested = False
        self._lang_requested     = False
        self._lock = threading.Lock()

        self._pa     = pyaudio.PyAudio()
        self._stream = self._pa.open(
            format=pyaudio.paInt16,
            channels=CHANNELS,
            rate=SAMPLE_RATE,
            input=True,
            input_device_index=DEVICE_INDEX,
            frames_per_buffer=CHUNK,
        )

The class provides slot methods to handle button clicks for requesting grammar explanations, example sentences, and language changes. These methods set thread-safe flags that the main loop monitors.

AssistantWorker Run Loop

The run() method contains the main loop that continuously checks the state of hardware buttons and processes user input accordingly.

When Button A is held, it records audio from the microphone. When Button A is released, it normalizes the recored audio, converts it to WAV format, and sends it to OpenAI’s Whisper model for transcription. The transcribed English text is then translated into the selected language using OpenAI’s GPT model.

Next the translated text is displayed, and a TTS audio file is generated and played back. Button B allows replaying the last generated audio. Pressing the language button cycles through the supported languages.

If the user requests grammar explanations or example sentences, the worker calls the appropriate OpenAI API endpoints to generate the content and updates the display.

def run(self):
    self.status.emit("Hold Button A to speak")

    while True:
        a_pressed = button_a.is_pressed()
        b_pressed = button_b.is_pressed()

        with self._lock:
            explain_req  = self._explain_requested
            examples_req = self._examples_requested
            lang_req     = self._lang_requested
            self._explain_requested  = False
            self._examples_requested = False
            self._lang_requested     = False

        if lang_req and not a_pressed:
            self._lang_index = (self._lang_index + 1) % len(LANGUAGES)
            self._language   = LANGUAGES[self._lang_index]
            self.language_changed.emit(self._language)

        elif a_pressed:
            self._clear_state()
            raw_pcm = record_while_held(self, self._stream)
            if not raw_pcm:
                self.status.emit("Hold Button A to speak")
                continue

            self.status.emit("Transcribing...")
            normalized = normalize(raw_pcm)
            wav_bytes  = pcm_to_wav(normalized)
            question   = transcribe(wav_bytes)

            self.status.emit(f"Translating to {self._language}...")
            translation = translate_only(question, self._language)

            self._last_question    = question
            self._last_translation = translation

            self._refresh_display()
            self.btn_explain_on.emit(True)
            self.btn_examples_on.emit(True)

            self.status.emit("Generating audio...")
            tts_text = extract_first_line(translation)
            generate_tts(tts_text, TTS_FILE)
            self._tts_ready = True
            self._set_default_status()
            play_audio(TTS_FILE)

        elif b_pressed:
            while button_b.is_pressed():
                time.sleep(0.01)
            if self._tts_ready:
                self.status.emit("Replaying...")
                play_audio(TTS_FILE)
                self._set_default_status()

        elif explain_req and self._last_translation:
            self.status.emit("Explaining grammar...")
            self._last_grammar = explain_grammar(
                self._last_question, self._last_translation, self._language
            )
            self._refresh_display()
            self._set_default_status()

        elif examples_req and self._last_translation:
            self.status.emit("Generating examples...")
            self._last_examples = add_examples(
                self._last_question, self._last_translation, self._language
            )
            self._refresh_display()
            self._set_default_status()

        else:
            time.sleep(0.02)

State Management Methods

The worker class includes helper methods to reset the internal state, update the displayed text, and set the default status message. These methods ensure that the UI reflects the current state of the application accurately.

AssistantUI Class

This class defines the graphical user interface using Qt widgets. It creates a fixed-size window with an orange background and black text, matching the color scheme defined earlier.

The UI consists of a top row with buttons for font size adjustment, volume control, and language selection. Below that, a status label displays messages to the user. The main text area shows the transcribed, translated, and explanatory text.

At the bottom, buttons allow scrolling through the text and requesting grammar explanations or example sentences. The buttons are styled consistently using the previously defined styles.

The class provides methods to update the status text, displayed answer, enable or disable buttons, and change the language label. It also handles user interactions for scrolling and adjusting font size and volume.

class AssistantUI(QWidget):
    def __init__(self):
        super().__init__()
        self.setWindowTitle("Voice Chatbot")
        self.setFixedSize(240, 320)

        self.setStyleSheet(
            f"QWidget  {{ background-color: {ORA_BG}; color: {ORA_TEXT}; }}"
            f"QTextEdit {{ background-color: #ffffff; color: {ORA_TEXT};"
            f"             border: none; }}"
        )

        layout = QVBoxLayout(self)
        layout.setContentsMargins(4, 4, 4, 4)
        layout.setSpacing(3)

        lang_row = QHBoxLayout()
        lang_row.setSpacing(4)

        _dark_style = _btn_style(bg=ORA_DARK, en=ORA_DARK, pr="#8a3d00")

        self.btn_font_minus = QPushButton("-")
        self.btn_font_minus.setFixedSize(28, 26)
        self.btn_font_minus.setStyleSheet(_dark_style)
        lang_row.addWidget(self.btn_font_minus)

        self.btn_vol_down = QPushButton("<")
        self.btn_vol_down.setFixedSize(28, 26)
        self.btn_vol_down.setStyleSheet(_dark_style)
        lang_row.addWidget(self.btn_vol_down)

        self.btn_language = QPushButton(LANGUAGES[0])
        self.btn_language.setFixedHeight(26)
        self.btn_language.setStyleSheet(_dark_style)
        lang_row.addWidget(self.btn_language, stretch=1)

        self.btn_vol_up = QPushButton(">")
        self.btn_vol_up.setFixedSize(28, 26)
        self.btn_vol_up.setStyleSheet(_dark_style)
        lang_row.addWidget(self.btn_vol_up)

        self.btn_font_plus = QPushButton("+")
        self.btn_font_plus.setFixedSize(28, 26)
        self.btn_font_plus.setStyleSheet(_dark_style)
        lang_row.addWidget(self.btn_font_plus)

        layout.addLayout(lang_row)

        self._font_size = 10
        self.btn_font_plus.clicked.connect(self._font_increase)
        self.btn_font_minus.clicked.connect(self._font_decrease)
        self.btn_vol_down.clicked.connect(self._vol_decrease)
        self.btn_vol_up.clicked.connect(self._vol_increase)

        self.status = QLabel("Starting...")
        self.status.setAlignment(Qt.AlignCenter)
        self.status.setStyleSheet(
            "QLabel { background-color: #ffffff; color: #000000; padding: 1px; }"
        )
        layout.addWidget(self.status)

        self.text = QTextEdit()
        self.text.setReadOnly(True)
        self.text.setTextInteractionFlags(Qt.NoTextInteraction)
        layout.addWidget(self.text, stretch=1)

        QScroller.grabGesture(
            self.text.viewport(),
            QScroller.LeftMouseButtonGesture
        )

        btn_row = QHBoxLayout()
        btn_row.setSpacing(4)

        _scroll_style = _btn_style(bg=ORA_DARK, en=ORA_DARK, pr="#8a3d00")

        self.btn_scroll_up = QPushButton("▲")
        self.btn_scroll_up.setFixedSize(40, 26)
        self.btn_scroll_up.setStyleSheet(_scroll_style)
        btn_row.addWidget(self.btn_scroll_up)

        self.btn_explain  = QPushButton("Explain")
        self.btn_examples = QPushButton("Examples")

        for btn in (self.btn_explain, self.btn_examples):
            btn.setEnabled(False)
            btn.setFixedHeight(26)
            btn.setStyleSheet(_btn_style())
            btn_row.addWidget(btn, stretch=1)

        self.btn_scroll_down = QPushButton("▼")
        self.btn_scroll_down.setFixedSize(40, 26)
        self.btn_scroll_down.setStyleSheet(_scroll_style)
        btn_row.addWidget(self.btn_scroll_down)

        layout.addLayout(btn_row)

        self.btn_scroll_down.clicked.connect(self._scroll_down)
        self.btn_scroll_up.clicked.connect(self._scroll_up)

Audio Helper Functions

Several helper functions handle audio processing tasks. The normalize() function adjusts the recorded PCM audio to maximize volume without clipping. record_while_held() records audio from the microphone while Button A is pressed. And the pcm_to_wav() function converts raw PCM bytes into WAV format, which is required for transcription.

Next we have the generate_tts() function that sends text to OpenAI’s TTS API and saves the resulting audio file. Finally, play_audio() plays the generated audio using pygame at the current volume.

OpenAI Helper Functions

These functions interact with OpenAI’s API to perform transcription, translation, grammar explanation, and example sentence generation.

The transcribe() function sends the recorded WAV audio to the Whisper model to obtain English text. translate_only() translates the English question into the selected language without additional explanations.

The explain_grammar() and add_examples() functions request grammar explanations and example sentences respectively, using GPT with prompts tailored for language teaching.

Main Function

The main() function initializes the Qt application, creates instances of the UI and worker classes, and sets up a separate thread for the worker to run concurrently.

It connects signals and slots between the worker and UI to update the interface based on the worker’s progress and user interactions. Finally, it starts the application event loop.

def main():
    app = QApplication(sys.argv)

    ui     = AssistantUI()
    worker = AssistantWorker()
    thread = QThread()

    worker.moveToThread(thread)
    thread.started.connect(worker.run)
    worker.status.connect(ui.update_status)
    worker.answer.connect(ui.update_answer)
    worker.btn_explain_on.connect(ui.set_explain_enabled)
    worker.btn_examples_on.connect(ui.set_examples_enabled)
    worker.language_changed.connect(ui.update_language_label)

    ui.btn_explain.clicked.connect(worker.on_explain_clicked,  Qt.DirectConnection)
    ui.btn_examples.clicked.connect(worker.on_examples_clicked, Qt.DirectConnection)
    ui.btn_language.clicked.connect(worker.on_language_clicked, Qt.DirectConnection)

    ui.show()
    thread.start()

    sys.exit(app.exec())

In summary, this code integrates hardware button input, audio processing, AI language services, and a responsive GUI to create an interactive voice-based language tutor. It leverages OpenAI’s models for transcription, translation, and language teaching.

Conclusions

In this tutorial you learned how to implement a simple AI Language Tutor on the UNIHIKER M10 using OpenAI services. I recommend that you also read the Voice Assistant on UNIHIKER M10 with OpenAI tutorial, which covers some basics that are not part of this tutorial.

While the AI Language Tutor in this tutorial is already useful for language learning there are many possible extensions to make it even more useful. For instance, the program could store the translations and explanations to revisit later. The program could generate short stories around sentences and read them out. It could extract verbs and nouns to be trained separately. And much more …

Have fun to extend the tutor for your purposes, and if you have any questions feel free to leave them in the comment section.

Happy Tinkering 😉