When I got my Amazon Echo, it seemed like a natural fit into my home automation setup, allowing me to add voice control and speech output.
Except it wasn’t.
I don’t use the very few devices that the Echo knows how to control, and Amazon didn’t provide any way to extend or customize the Echo’s functionality. Intrepid hackers who’d come before me had used the To Do List feature for a rudimentary, if unnatural, way to build custom voice-controlled actions. I took it in a different direction by using fake WeMo devices, which I described in Amazon Echo and Home Automation, and that worked pretty well.
I spent a few evenings getting the virtual WeMo devices to work. And the very next day, Amazon released their SDK. Time to take things to the next level.
Anatomy of an Amazon Echo App
Amazon calls their apps “skills”, and their SDK is therefore the Alexa Skills Kit or ASK. There is no charge to use the Alexa Skills Kit, and presumably Amazon is hoping that many developers will publish all sorts of cool new apps, er, skills.
The code for an Alexa skill lives on a web server on the Internet. When you activate your app with a voice command to your Echo, the Echo sends the request to Amazon’s servers. They, in turn, send an HTTPS message to your app’s URL. Your code does whatever you like, then sends a response back to the Amazon servers which then instruct your Echo to speak the reply you programmed.
The key thing to notice here is that your app’s code needs to live on a web server that can be reached from the Internet. If you don’t already operate a public-facing web server, you’ll have to set one up yourself or subscribe to an appropriate hosting service. Your other option is to use the Amazon Lambda service, which has a free option that should be plenty for most personal Alexa Skills projects. However, if you use Lambda, it limits the languages you can use to program your app. And more importantly for home automation purposes, code running on the Lambda service or a hosted web server is probably not going to be able to access your home devices. The rest of this post assumes that the web server is running from your home and either uses a static IP address or a dynamic DNS service.
The next important requirement is that your web server must support HTTPS on the standard port 443. There’s no option to use HTTP without SSL. If you’re setting up a local web server, this is an extra step you need to handle. But as long as you don’t intend to publish a public app for other people to use, Amazon will let you use a self-signed certificate that saves you the hassle and cost of buying a commercial certificate.
With those requirements in mind, an Alexa Skill boils down to a handful of web requests and responses, exchanging information in JSON format. Just about any web application technology and programming language can be used. Libraries that support JSON encoding and decoding are easy to find.
The Dialogue with Your App
In order to support lots and lots of new skills, each one needs a unique name. When you use your app, you can’t just say “Alexa” followed by input to your app; you need to say either “Alexa ask _________ to” or “Alexa tell _________ to” with your app’s name in the blanks. Amazon has documented that they won’t allow apps to have one-syllable names, so invoking your app requires at least four more syllables for each voice command than when using the Echo’s built-in features. This is slightly inconvenient, but it’s easy to see that without this, apps from different developers would have all sorts of conflicts.
I find the extra words to be too verbose when I just want to turn on a light. I’d much prefer to say, “Alexa, turn on the kitchen lights” than “Alexa, ask Walter to turn on the kitchen lights.” So the use of software virtual WeMo switches is still the better approach for most home automation commands. This has the additional benefit that the network communication stays local and doesn’t require the trip through the Internet to your web server.
When designing your app, you decide the types of things that it will be able to do. In the nomenclature of the ASK, these are called Intents. Each Intent matches one kind of question or instruction that you’ll speak to the Echo. You also get one special Intent for when you just say, “Alexa open Walter.”
The next thing you need is to figure out all the ways that you and your other app users are likely to phrase their questions and instructions for your app. This is where the flexibility and ambiguity of the English language can cause things to get a little messy. Amazon wants you to provide as many examples as possible to give you the best chance that it will match what you’re likely to actually say. Each example is called an Utterance, and when you provide all the combinations of the words people might use, the list of examples can easily grow into the many hundreds.
Intents, and their corresponding utterances, can include places where you expect details from your users. For example, if you have a movie information app, you might support a request to find out the IMDB rating of a movie. The spoken question will include the name of the movie for your app to look up. The variable parts of your intents are called Slots, and each slot has a name. The slot names and their values extracted from the utterances are passed to your app’s code.
Creating Your Own Alexa Skill
Here are the basic steps you’ll want to follow to start your own Alexa Skill.
- Make sure you have a web server that can be reached over the Internet and that is supports SSL on port 443.
- Sign up for the Amazon Developer Portal with your Amazon login.
- In the developer console, add a new Alexa Skill.
- Give your skill a display name (seen in the Echo App) and an invocation name (what you say).
- Provide a URL for your web server. All app requests will come to this URL. It must start with https:// and it must be accessible over the Internet.
- Define your intents and example utterances. We’ll go into this in more detail in Part 2 of this article.
- Select the appropriate option depending whether your web site uses a commercial certificate from a well-known certificate authority or you made your own self-signed certificate.
- Verify that test mode is enabled.
When those steps are done, you can start coding your app. If you named your app “Walter,” you should be able to verify that your web server URL is being accessed when you tell your Echo, “Alexa open Walter.”
Keep Reading!
In part 2 of this article, I tie this back to home automation and describe what I did to integrate the Echo with my Kodi media player. I can ask my Echo to tell me what shows have new episodes waiting to be watched.
Is there a way to have Alexa respond to “open” and “close” rather than “on” and “off”? I am creating a device that will control window blinds and saying “turn on the blinds” just doesn’t feel natural.
Thanks.
Ideally it would be great to be able to make this functionality change at the local level and not have to build an Alexa skill.
Thanks.
What about security of having your home automation residing on a server that is accessible from the internet? Can anyone get in if they know your hostname? My HA Server is in my home with no external hostname (just non-routable ipaddress). How do I get to it from Alexa?
In order to publish an Alexa skill officially, Amazon requires that you implement some basic security protections. In particular, your server code must verify that the requests are actually coming from Amazon by checking their SSL certificate. You must reject any requests that can’t be authenticated like this. If you don’t reject a non-Amazon request, your publication is denied until you fix it.
This restriction is not enforced during development. So it’s up to you if you want to be this secure for a private, un-published skill. But if you’re concerned about security at all, it’s not very difficult to implement this check.
I enjoyed reading your articles about Alexa, especially the part “OK, Master, whatever you say”. Somehow it made me laugh =)