Home Robot Build | Mark Making Things

This is a work in progress, so check back for updates.

Introduction

Parts & Components

Feature Overview

Physical Build

Video Collection

Here are some high level goals:

This is the first feature I built into Ron. Enjoy! See below for more about how all this works.

Starting with the hardware, this represents the core "brain" of the Robot where all the software is written, and all the hardware is controlled from

Next we have some support hardware. These are the extra bits an pieces that we'll be using to make him do stuff.

Other hardware TBC

There are still more hardware items to purchase, so parts I have ideas on, others I just know they're out there waiting to be discovered

Microphone - Looking at a more robust microphone for use once we get beyond the prototype phase. Something like the SeeedStudio ReSpeaker Mic Array.

Wheels & chassis 2WD Mobile Platform for Arduino is currently what I am looking at for the drive system.

Battery pack(s) will absolutely be required!

Many little pieces here and there to get it all together

A note about my code.

I have no problems sharing any of my code with anyone, so if you would like to see it, let me know. I'm refraining from making it generally public (at this time at least) because:

But if you would like to see some of it, let me know.

Raspberry Pi

The Raspberry Pi is a 4b model with 8GB of RAM. I chose this model as I hope to do lots of machine learning and image processing. It is running the 64 bit operating system.

I'm making use of the many python libraries to make all the magic happen.

Arduino Nano

The Nano is being used exclusively for controlling the LED matrix for the face, and any servos for movement, such as turning the hard or moving the arms.

Wake Word

I spent lots of time looking for the best way to implement the wake word. Many services have been long since depreciated and/or not updated to support the latest OS or Python versions. I landed on Porcupine Wake Word by picovoice.ai The service is free (with limits) and their Git-Hub has some great examples. https://github.com/Picovoice/porcupine

Speech-to-Text

For speech to text I am using Googles Speech to Text. While its not free, the price is low & in my experience the best performing. If you are looking for completely free, Picovoice has on device recognition, which honestly I have not tried since upgrading the to Pi 4 in terms of performance. I think with a good mic (i.e. not a webcam) and on this hardware, it would work really well.

The Google service has some great examples to get you started: https://cloud.google.com/speech-to-text/docs/samples

Text-To-Speech

For text to speech, i.e. how it talks, I landed on a Microsoft Text to Speech solution . Originally I could not get the Pi/Python/OS version using, and ended up offloading this to my local PC. However as my Python knowledge improved, I was able to target the REST API directly

For bonus marks:

Advanced Speech Functions

To enable more responsiveness, and a better experience, I developed "speech categories". These are what I call groups of responses to the same situation. For example, the "hello" category may include these possible responses:

These responses are pre-generated during startup, and can be references by a "speak from category" function.

To also reduce API calls, "speak from parts" was created. This allows for a list of phrases to be passed in. This is helpful for things like the weather, where the parts may be:

This allows for cache to be used in for each part, presuming the entire sentence isn't already cached.

Sound Demo

Here is sound bite:

The MS website for speech to text has a great tool for finding and customizing your voice. You will need to create a SSML (Speech Synthesis Markup Language) file. Here is the example for the above:

If you want to design your own voice, head over to https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/#features

Here is what the UI would look like for his voice, approximately.

Program concurrency, multithreading and multiprocessing

When I started building this, it was a few methods/functions being called one after the other when things happened. For example, the main code would wait at the wake word, then when activated, the next call was to Speech-to-Text.

My lack of knowledge using PyAudio & audio devices in general meant a long time between when the wake word was said, and when recording of the command could begin. So I then split them into separate threads, using "true/false" flags in files written to disk so they can signal each other.

I then discovered in Python that it will not actually execute more than one thread at at time, which IMO defeats the purpose. To work around this, all core functions are now individual python scripts, started from the main script. They run independently from each other, using the "flag file" method to communicate. The article Multithreading VS Multiprocessing in Python reference explained threading & multiprocessing very well.

SD card latency was the next hurdle, These flag files worked great, however at times the performance was poor. Given the contents of these files did not need to persist across reboots, I used a RAM disk for these. This may have been the single biggest performance improvement. I followed this guide here: RASPBERRY PI 4 RAM DISK.

Facial & Object Recognition

A key part of this build was so it could recognize people around it, and respond accordingly. The package used for this was OpenCV, which is somewhat of a standard for image capture and analysis.

Now, OpenCV may actually be the hardest thing to get working. I lost weeks of time getting it going, even resorting to using others images to get it going. Many incompatibilities all over the place, including with the latest release if you are on 32bit. I found this tutorial worked great when I was on the 32bit version. If you are on 64bit, I believe the install is really simple if you look for it, like a "pip" one liner.

Face Recognition With Raspberry Pi and OpenCV

After getting it working, I removed references to any GUI component, and add the following:

Object and Animal Recognition With Raspberry Pi and OpenCV

Next up was object detection. I have this working but not currently doing anything with it, mostly on account of bad lighting conditions in my basement finding cats, dogs and couches everywhere, despite the camera being trained on me.

The list of built in functions he can to is growing all the time. Below is a simple table I've created which helps briefly summarize them.

Some other features are:

One thing I wanted to do was give the robot a personality. I'm not a behavioral scientist, but my ideas to emulate personality are:

Features listed here are ones written after this article started, ones above where I have taken the time to detail them.

The coin flip is really simple, but keeping up with a "personality", I've added extra features:

Here is a code snippet. Code in BOLD indicates a custom procedure to perform the action, They are pretty obvious as to their functions.

Ideally the options and possibilities would be configurable & not hard coded.

Using servos to move things is a pretty standard approach for robotics. Using the small, cheap off the shelf servos with easy to use Arduino libraries gets you going quickly. Here is a quick video of the "say hi" functionality.

https://youtube.com/shorts/dMeqeBoNYow

I decided to use 2 ultrasonic distance sensors for this project. My main concern was handling angled surfaces, and with two sets of readings, I would be able to determine if I am hitting an angle or not. I build a simple harness out of left over lumber for testing:

These are 3.3v ultrasonic sensors, that attach directly to the Raspberry Pi. To achieve more reliable results, a measurement is taken from each sensor 5 times, alternating. The list of measurements is then sorted & we take the "middle" value. This all happens in less than a second.

A 3rd sensor will be used to measure distance off the ground - specifically the front. The robot will be weighted at the rear, so the downward pointing 3rd sensor will be able to be over the edge of a step & command the robot to reverse or stop. Additionally, if its not moving & the value changes, the robot can make a comment about being picked up.

This sounded simpler on paper - when enabled, make chicken noises instead of real responses. In reality, a little harder. The solution was to implement this in the "playSound" function. The logic is as follows:

I used the chicken sounds recorded & posted in this GitHib mmalpartida/chicken-assistant-skill. Do check out that project, he built an animatronic chicken & an assistant AI. I then cut the sounds into smaller sub-files. You can download the resulting chicken sample files & sample code is below.

I had this idea that Ron could read a barcode, and act accordingly. If it was a QR code, it could sent the URL (if present) to my phone, or read aloud the details. If it was a barcode, it would in turn hit some "yet-to-be-defined" API to get pricing information & product description.

The go-to choice for this in Python is OpenCV with pyzbar. The code is really straightforward, especially if you do not need to see the screen yourself. Here is the base code:

The method process_barcode is where my code lives. Here is a snippet:

Quick notes about these methods:

sendNote - uses this implementation of the Pushbullet API to send.

getPageSummary - uses Newspaper3k to quickly scan the website.

Access to AI & machine learning has become very much common place in the last 12 months. I've been incorporating as much of that as I can into this project!

DeepAI - https://deepai.org/ have a range of APIs you can target easily & either free or low cost.

We're a "Google" house, with Chromecasts in every main room & speakers on all levels. I've used the pychromecast library to control this. The logic works as follows:

I've written the code for this however I am not sure it will make it into the build. Giving Ron an "ask twitter" ability may not necessarily yield content that is useful in this context, but time will tell.

Using the Twitter v2 search API, I use a custom search to try and get the content I need:

Filtering our retweets & replies, then omitting some words as a starting point. langdetect and textstat libraries to filter those results for meaningful content.

I currently have no plans to allow Ron to post content directly!

I wrote a couple of simple games to play on Ron. The classic Hangman and simple number guessing game. For hangman, the game will use pre-drawn hangman images to display progress in the game. You can download the hangman images from here if you would like to use them.

A self aware robot wouldn't be complete it if he didn't know about Chuck Norris. I've used this Chuck Norris API to pull in "facts" both randomly and on demand. Warning, you might want to filter results through a profanity/kid friendly filter first.

Old school ASCII art goes way back to the days of dial-up bulletin boards & perhaps even beyond. I've used a couple of services to recreate this feature:

The image below highlights the process:

Constructing the physical body may be the hardest part. Code is easy to google, debug and re-write. But making something that is:

All at the same time is a few difficult task. If I was proficient in 3D printing, things would be somewhat easier. But I'll be using every day materials. Plastic, wood & anything else I can find in between.

This will sit at the between the head and the top of the body. Technically it will be part of the head, so that it can turn & rotate. However we may end up having this as part of the main body, or splitting into 2 different sections. Here are some WIP shots of the sensor array:

Here are some videos of Ron doing his thing. Randomly added here where they don't otherwise fit with the rest of the article.

Tell me some Dad jokes

We use the https://icanhazdadjoke.com/ API to have Ron the Robot tell some random dad jokes.

Any idea or feedback? Want to see some of the code? Drop me a line in the comments, or on Twitter @legoszm or Instagram @markmakingthings