...

My Final Project

Summary

The player wears an old suit, a top hat and a suitcase. They grab an old Nokia phone in their hand, press the green call button and say “Business time” to start the game.

The game begins. There is a coin standing in the middle of a platform (on rails), and the platform starts to tilt causing the coin to roll. If the coin rolls to either end of the platform, the player loses.

The player has to balance the coin on the platform by vocalizing “BUY!” (platform tilts to the left) and “SELL!” (tilts to the right) to the phone. The game gets more difficult over time.

Speech

On week 43, I already started looking into speech recognition for distinguishing BUY and SELL. Checking the spectrogram (below), it looks like I could use something quite elementary, probably don’t even need ML stuff.

Detail image “Buy” on the left, “sell” on the right. Note how abrupt the “b” is compared to the “s”, which always starts out as “sssss”. The [a] and [e] also look like it shouldn’t be too difficult to distinguish them.

I saw the voice detection as the most technically difficult part of the project. I studied the basics of sound processing (in ML context) and voice recognition from HuggingFace’s audio course.

I looked into a lot of different alternatives for detecting the words. I ruled out online solutions due to latency and reliability issues. I wanted something offline, something simple that fits into an arduino, which ruled out more sophisticated ML-based speech recognition tools like Whisper.

I ended up with Cyberon’s speech recognition software. Also found free options for phoneme-based recognition, but they seemed quite old.

Mobile phone

Detail image

Nokia 6110. Arduino RP2040 inside recognizing voice, transmitting over wifi to UNO.

Platform

UNO connected to step motor, tilts clockwise and counter-clockwise.

Extras

  • Display with scrolling text
  • LED-lights with BUY and SELL signs giving player more explicit feedback

References

Reference for platform setup: balancing a ping-pong ball. Mine is conceptually simpler, but could use similar logic for accelerating the platform turns.

Reference mentioning issues with distance sensors: balancing metal ball on seesaw.