We’ve all been there: you ask Siri a question and it responds with the always frustrating “Sorry, I didn’t understand”. It could be an accent or dialect problem, the fact that Siri isn’t trained on the massive volume of data that Google’s AI is trained on, or just that Apple absolutely dropped the ball on Siri. Apple launched its voice AI as an app nearly 13 years ago, although today Siri still feels noticeably dumb and unhelpful even after more than a decade. Google’s voice AI seems to be overwhelmingly the most popular choice these days, although there’s a new kid on the block that’s absolutely eating Google’s lunch, at least in the search department.
Unveiled less than a year ago, ChatGPT from OpenAI took the world by storm for its incredible natural language processing capabilities, reaching one million users in just 5 days and 100 million users in just two months (that’s faster than the growth of social media ). giants like Facebook, Google and even Snapchat). ChatGPT’s intelligent and human-like responses make it the perfect AI chatbot, especially given that it truly understands natural sentences much better than most other AI tools, and is most likely to respond with a helpful answer rather than an apology. Developer Mate Marschalko saw this as a brilliant opportunity to integrate ChatGPT’s intelligence with Siri, turning it into a much more useful voice AI. With a bit of hacking (which only took him about an hour), Marschalko combined Siri’s voice capabilities with ChatGPT’s NLP intelligence using Apple’s shortcuts. The result? A much better Voice AI that fetches better search results, makes more meaningful conversations, and even lets you control your smart home in a much more “human-friendly” way…almost rivaling Tony Stark’s JARVIS in terms of usability. The best part? You can do it too!
Marschalko lists his entire procedure in a Medium blog post that I definitely recommend checking out if you want to build your own ‘SiriGPT’ too, using an approach that required absolutely no coding experience. “I asked the GPT-3 to pretend it was the smart brain in my house, carefully explaining what it has access to around the house and how it should respond to my requests,” he said. “I explained all this in plain English with no program code involved.”
The video above shows exactly how Marschalko’s ‘SiriGPT’ works. His home is filled with dozens of lights, thermostats, underfloor heating, ventilation units, cameras and much more, making it the perfect testing ground for possibly any use. Marschalko starts by dividing his tasks into four different request types. The four request types are labeled Command, Query, Answer, and Clarify, and each request type has its own process that GPT-3 follows to determine what needs to be done.
Where the magic really unfolds is how even indirect requests from Marschalko are understood and translated into meaningful actions by the assistant. While Siri and other AI assistants only respond to direct requests such as “turn on the lights” or “open the garage door”, GPT3 allows for more nuanced conversations. In one example, Marschalko says, “Notice that I’m recording this video in the dark, in the office. Can you do something about that,” and the assistant immediately turns on the light while responding with an AI-generated response instead of a template response. In another example, he says “my wife is on her way home and will be here in 15 minutes. Turn on the lights for her outside right before she parks”, to which the assistant responds with “The lights should be turned on when your guest arrives!”, demonstrating two powerful things… A. The ability to understand concepts as complex as ‘want to turn on a specific light after a delay of a few minutes’, and B. Responded in a natural way that conveys that they understood exactly what you wanted done.
Marschalko plugged all of this into a shortcut called Okay Smart Home, and to operate it, all he had to do was activate Siri and say the name of the shortcut (in this case, “Ok Smart Home”) and then start talking to his assistant . The four request types initially allowed Marschalko to cover all kinds of scenarios, from controlling smart home appliances with the command request to asking the status of an appliance (such as the temperature of a room or the oven) with the query request. The Answer request covers more chat-centric questions such as asking the AI for recommendations, suggestions or general information from around the web, and the final Clarify request will allow the AI to ask you to repeat or rephrase your question if it was unable to to detect any of the three previous request types.
While this GPT-powered assistant certainly runs circles around the visibly dumber Siri, it doesn’t come for free. You must set up an OpenAI account and purchase tokens to access the API. “Using the API will cost about $0.014 per request, so you can perform over 70 requests for $1,” says Marschalko. “Remember, this is considered expensive because our request is very long, so with a shorter one, you pay less proportionally.”
The entire process is listed in this Medium blog post if you want to learn how to build your own assistant with its distinct features. If you have an OpenAI account and want to use the AI that Marschalko built in the video above, the Okay Smart Home shortcut is available to download and use with your own API keys.