Spoiler Alert!
Literally the day after I got the Echo to work with my home automation, there were a flurry of announcements about the Alexa Skills Kit. That’s a subject for another post. It turns out that this hack to do direct home automation is a better experience than what you can get by making an Echo “app”.
Anyone who has taken steps toward home automation can probably relate to the feeling of wrongness that the Amazon Echo has such limited options for integrating into a smart home. I don’t use the Belkin WeMo system or Philips Hue light bulbs. But it just seems like I should be able to say, “Alexa, turn on the kitchen light” and make it work with my setup. There’s just enough already built in that not being able to do this is frustrating.
Here’s what I did to get it to work. My solution is general enough that it can be easily tweaked to work with many different technologies as long as you’ve got some kind of API available.
My Setup
Does anyone besides me feel like the ability to tie multiple home automation technologies together, with reasonably scriptable APIs, has finally arrived? It used to take tons of custom code to interface to each different component. More often than not, the technology got upgraded (X-10 to Insteon, for example) before the integration code was finished. So I, at least, never got to the point of being able to use all those drivers from automation logic.
Is this a result of the world standardizing on HTTP and JSON encoding? Or did I finally reach critical mass with my own custom building blocks?
Here are some various technologies I’ve collected and installed over the years:
- Lights: Insteon with ISY-994i controller
- Alarm: Elk M1-Gold with Elk M1XEP Ethernet Interface
- Audio/Video control: Global Caché GC-100
- Media: Kodi (formery XBMC) on Ubuntu
Tying this all together, I have a dedicated internal server running Apache and wsgi Python code. I’ve built small Python modules that implement functions for common actions on each device. So I have an elk.py, isy.py, xbmc.py, etc. I then provide a more-or-less unified RESTful API on an internal URL.
Amazon Echo’s Current Home Automation Support
Until the recent addition of support for the Wink hub, the Echo knew how to talk to only the Belkin WeMo switches and the Philips Hue lights. That’s great if you have those devices, but even with the added support for Wink, there’s no overlap between my home setup and what the Echo supports. And the minute you want to do something more complex, say a macro that turns down the lights and turns on the TV, you’re out of luck.
The WeMo devices use the UPnP protocol to advertise themselves on the network, respond to searches from controllers, and define the details of their control interfaces. The Echo searches for the WeMo devices specifically and is programmed to know about the WeMo API. The minimal amount that the Echo uses the UPnP protocol means that it should be possible to emulate WeMo devices on the network in software.
Finding out how the Echo and the WeMo interact took some network sniffing with Wireshark. Because the Echo and WeMo are both WiFi devices, capturing the network traffic required a wireless adapter that could be put into “monitor mode”. This is often not possible under Windows. I used a USB WiFi adapter on a Linux system and had no problems. The next obstacle is the encryption between the access point and the WiFi devices. I wasn’t willing to turn off my security. Wireshark can decrypt the traffic if you tell it your SSID and passphrase, as long as the captured data includes the four EAPOL handshake packets from each device. This handshake is done when the device connects to the access point, so Wireshark needs to be capturing when you plug in the Echo and the WeMo.
Here’s an overview of the sequence of events that takes place when the Echo discovers WeMo switches and then responds to a voice command to turn the switch on:
Emulating the WeMo Switch
Creating a software emulation of the WeMo switch would allow me to have as many virtual WeMo devices as I wanted on my network, each with a different name. Each switch can be told to turn “on” or “off”, so the interface is pretty basic. But with an unlimited number of virtual switches, it doesn’t cost anything to have one called “Television”, another called “T. V.” that does the same thing as “Television”, and one more called “T. V. Mode” that turns the television on or off and sets the lights to their desired states.
Here’s what I decided I needed for my virtual WeMo cloud:
- An IP address for each virtual switch.
- A listener for UDP broadcasts to address 239.255.255.250 on port 1900.
- A listener on port 49153 for each switch on its associated IP address.
- Logic to customize the search response and the setup.xml to conform to the UPnP protocol and give the Echo the right information about each switch.
- Logic to respond to the on and off commands sent by the Echo and tie them to whatever action I wanted to really perform.
I don’t know that the Echo requires a different IP address for each switch or if I can use multiple ports on a single IP address or even multiple URLs on a single port. In theory, if the Echo honors the data sent in response to its searches and requests, it should be possible to just assign each virtual switch its own URL. But it’s also possible that the Echo is hard-coded to use port 49153 and to POST to /upnp/control/basicevent1. I haven’t yet experimented to find the minimum conditions that will work, so I started by emulating each switch with its own IP address. With the Linux server I’m using, it’s easy to create eth0:1, eth0:2, etc. each with its own IP address.
239.255.255.250:1900 is the address and port specified by the UPnP protocol. Only one such listener is needed since it can send multiple responses, one for each switch, in response to a search request.
A search request from the Echo is a UDP broadcast formatted as an HTTP request, with HTTP headers indicating what is being searched for. There is no body. The search request comes from the Echo’s IP address and (at least in my case) port 50000. The request looks like this:
M-SEARCH * HTTP/1.1 HOST: 239.255.255.250:1900 MAN: "ssdp:discover" MX: 15 ST: urn:Belkin:device:**
Each UPnP device on the network that satisfies the search term is supposed to send a UDP message to the IP address and port that made the search request. The response is formatted as an HTTP response. But this is not TCP, there aren’t really any connections involved. One request from the Echo generates many responses depending on the number of switches on the network. The response from a switch looks like this:
HTTP/1.1 200 OK CACHE-CONTROL: max-age=86400 DATE: Mon, 22 Jun 2015 17:24:01 GMT EXT: LOCATION: http://192.168.5.190:49153/setup.xml OPT: "http://schemas.upnp.org/upnp/1/0/"; ns=01 01-NLS: 905bfa3c-1dd2-11b2-8928-fd8aebaf491c SERVER: Unspecified, UPnP/1.0, Unspecified X-User-Agent: redsonic ST: urn:Belkin:device:** USN: uuid:Socket-1_0-221517K0101769::urn:Belkin:device:**
In order to avoid potential problems with my switch emulation, I tried to make it as compliant as possible with the UPnP specification and the actual behavior of the WeMo devices. I suspect that the Echo is probably not very picky, though, and many of the details could be ignored.
The 01-NLS header is supposed to contain a UUID that changes with every reboot. I assign a UUID each time the virtual switch is created, which happens each time I restart my WeMo emulation program.
The X-User-Agent is not part of the UPnP spec. However, this is what the WeMo sends and there’s at least one project on the web I’ve seen where they look for this string to find WeMo devices.
The uuid part of the USN header is supposed to be persistent for each device even across reboots. Belkin appears to use the fixed string “Socket-1_0-” followed by the switch serial number. I created a simple transformation of the switch’s friendly name into a generated serial number. That way, a switch with a given name will always generate the same USN. Having multiple switches on the network with the same name would cause unpredictable problems.
Once the Echo receives the search response, it sends an HTTP GET request to the URL specified in the LOCATION header. That request is very minimal:
GET /setup.xml HTTP/1.1 Host: 192.168.5.189:49153 Accept: */*
And the switch responds with the device description file, which is 133 lines long in the WeMo switch that I tested. The top part of that file contains this:
<?xml version="1.0"?> <root xmlns="urn:Belkin:device-1-0"> <specVersion> <major>1</major> <minor>0</minor> </specVersion> <device> <deviceType>urn:Belkin:device:controllee:1</deviceType> <friendlyName>kitchen light</friendlyName> <manufacturer>Belkin International Inc.</manufacturer> <manufacturerURL>http://www.belkin.com</manufacturerURL> <modelDescription>Belkin Plugin Socket 1.0</modelDescription> <modelName>Socket</modelName> <modelNumber>1.0</modelNumber> <modelURL>http://www.belkin.com/plugin/</modelURL> <serialNumber>221517K0101769</serialNumber> <UDN>uuid:Socket-1_0-221517K0101769</UDN> <UPC>123456789</UPC> <macAddress>94103E3489C0</macAddress> <firmwareVersion>WeMo_WW_2.00.8326.PVT-OWRT-SNS</firmwareVersion> <iconVersion>0|49153</iconVersion>
The <friendlyName> element is what the Echo will listen for when you tell it turn it on or off. This needs to be set differently for each virtual switch.
The <serialNumber> and <UDN> fields use the same information that was used to populate the USN header of the search response.
Those are the only fields I change in my generated setup.xml response. I haven’t noticed any problems by not changing the <macAddress> even though it’s the same value for all of my virtual switches.
When you tell the Echo to turn a device on or off, this is what it sends as an HTTP request to the device:
POST /upnp/control/basicevent1 HTTP/1.1 Host: 192.168.5.189:49153 Accept: */* Content-type: text/xml; charset="utf-8" SOAPACTION: "urn:Belkin:service:basicevent:1#SetBinaryState" Content-Length: 299 <?xml version="1.0" encoding="utf-8"?><s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/" s:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"><s:Body><u:SetBinaryState xmlns:u="urn:Belkin:service:basicevent:1"><BinaryState>1</BinaryState></u:SetBinaryState></s:Body></s:Envelope>
That’s an awful lot of stuff just for the “1” or “0” you actually care about in the <BinaryState> element. I didn’t bother with a SOAP parser. I simply look for the string “SOAPACTION: “urn:Belkin:service:basicevent:1#SetBinaryState”” in the data and, if it’s there, look for either “<BinaryState>1</BinaryState>” or “<BinaryState>0</BinaryState>”. Similarly for the response, which has no dynamic data other than the date and the content-length:
HTTP/1.1 200 OK CONTENT-LENGTH: 295 CONTENT-TYPE: text/xml; charset="utf-8" DATE: Mon, 22 Jun 2015 22:45:57 GMT EXT: SERVER: Unspecified, UPnP/1.0, Unspecified X-User-Agent: redsonic <s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/" s:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"><s:Body> <u:SetBinaryStateResponse xmlns:u="urn:Belkin:service:basicevent:1"> <CountdownEndTime>0</CountdownEndTime> </u:SetBinaryStateResponse> </s:Body> </s:Envelope>
That’s all it takes to finish the dialog necessary to make the Echo think the software is a genuine WeMo switch. A tiny bit of extra code wired the “1” and “0” commands into REST API requests, and I was able to make it all work from voice command to action. When I walk in from the garage at night, I can tell Alexa to turn on the kitchen lights without ever having to reach for a wall switch.
That’s really all I care about for my WeMo emulation. In particular, the search request that Echo broadcasts is not the same as the search performed by the WeMo app on my Android phone. So the WeMo app can’t see or control my virtual switches. This is not a drawback for me, so I don’t intend to flesh out the UPnP handling.
So Where’s the Code?
Update
The code is now available. You can read about it in my Virtual WeMo Code for Amazon Echo post.
The code is a single Python file and isn’t very complex. I am happy to release the source except that right now it contains the entire contents of the setup.xml I captured from my WeMo switch. I don’t know how Belkin would feel about me publishing that file beyond the short excerpt above.
My intention is to spend one or two more evenings experimenting to see which components the Echo cares about and which ones can be removed or changed and have the Echo still work. I’ll then produce my own contents that make the Echo happy. Of course, that’ll make my emulated switches even less compatible with other WeMo apps, but I don’t consider that a downside. As soon as that’s done, I’ll update the code and post it.
Stay Tuned!
Less than 24 hours after I got this working and was controlling my home automation with voice commands, Amazon opened access to the Alexa Skills Kit. My next post about the Echo will talk about why it’s not as good as the WeMo emulation for basic on/off control, and what I’ve done with it to integrate the Kodi media system with the Echo.
I stumbled across this just as I was about to give up trying to interface my Echo with my Mac Mini Server. I look forward to hearing more on this.
I don’t like the SDK solution for local network things, it seems inefficient. I really like the direction your taking with UDP and SOAP.
I signed up for the SDK as well, but it is much uglier than I hoped, without a number of features I wish it would have.
I’d love to play with your much simpler example.
Very very cool. I have a modest home automation system that is hacked together from lots of different devices. I have a central controller that ties them all together.
Right now I am on the waitlist for an Echo, but want to start getting the code in place to make it work as soon as I get it.
Looking forward to seeing what code you can share once you get it cut down and ‘de-WeMo’ed’.
The code is now available on GitHub. Read about it here http://www.makermusings.com/2015/07/18/virtual-wemo-code-for-amazon-echo/
Thank you Chris, your work here really fits a niche in my home. Now I just need to get better with Python (and now I have an excuse).
Chris,
I just wanted to thank you for all the work you did creating the Fauxmo program and especially for taking the time to document it for the rest of the world to use. I have it on a raspberry pi where it calls simple PHP programs on the webserver running on the Pi. These programs simply invoke BASH commands to the “heyu” program which talks to my CM11A X10 controller. I assigned multiple IPs to the Pi, one for each device I need to control. I am looking at buying a Harmony G1 Model #6007D to replace the CM11A controller so I can control devices using other protocols like ZigBee and Insteon.
Just to clarify, you don’t need to use multiple IPs on a Pi if all the different interface devices (like the CM11A) are on the same Pi.
Hey Peter;
I’d love to get a look scripts. I’m hitting a wall trying to parse each device command…
thanks,
Mike
Thanks for posting this up. If you haven’t already, would you mind if I linked to this on the UDI forums?
By all means, please feel free to post a link!
Hi Chris. This works nicely BUT, I noticed that the Echo does not detect more than 13 devices. Once, it detected 15 but that is the max I’ve seen. Currently, I got a Pi running the Echo Ha Bridge which allows me to have 28 devices and fauxmo which allows me 13, for a total of 41. Is there a way to support more devices. Fauxmo is way lighter on processor than HaBridge, so it seems to be a much better option.
Kudos by the way, excellent programming.
Thanks for your kind comments.
It turns out that there are a few limits in the Echo. The first one is the amount of time it will spend searching for devices. The code on Github has been updated to work better with this time limit. However, there also appears to be a hard-coded limit of 16 WeMo devices that the Echo can use. If I had to guess, I’d say it’s probably a limit to the size of the internal table that the Echo’s programmers put in place. The updated code can now provide all 16 (virtual) devices to the Echo.
Hi Chris,
Really neat stuff your doing!
Have a Echo and a Elk M1 system with EXP Ethernet interface would be really interested in find out more about your elk.py. I can generate any lighting, output, task, etc commands to get M1 to action just trying to get my head around how to pull the the API thing together.
Happy to get raspi if a server is required……any assistance on the basic building blocks would be of great assistance.
Not proficient coder like you just scratch around in the coup! (-: if I can see what you are doing in your elk.py and it is step and repeat for all the M1 commands I need I am good with that.
Appreciate any assistance to get me up and going.
Cheers,
Ian from downunder…
Hi Ian,
Here’s my elk.py library. It’s very basic and I haven’t done any real documentation on it. But hopefully it’s clear enough for you to figure out how to use it. Follow up with questions and I’ll do my best to help out.
elk.py download
Hello
Does your software nowadays work with more than 16 devices?
I`m looking for a solution that depending on alexa voice command will run a script on Raspberry Pi. However I need way more commands then 16. Could you recommend something for this?
The last time I looked at this, 16 devices was the maximum that the Echo supported. This is a limitation within the Echo’s firmware.
Amazon has released a bunch of API updates in the last few months, though. There may be a way to control more than 16 things, either with one of the newer APIs or by mixing several approaches together. I hope to have time to look into this soon.
I was wondering – Does anyone have the handshaking for a Wemo fader/dimmer switch? I’d investigate myself except they seem to be only available in 110v land and not 220vn
I am trying to detect smart home devices which is connected to two raspberrypi3 .But Echo is not detecting .please guide for this.
Hey Chris,
Thanks for this. I’ve been trying different flavors of this handshaking and decided to add it to my ESP8266 devices, which I’ve been controlling using MQTT lately. I’ve used FauxMoESP also, but wanted to integrate MQTT. Your code was simple enough to adopt with MQTT instead of the generic HTTP requests.
Listening to UDP packets across your local network can uncover all kinds of devices. I just discovered that my Sony Bravia TV is broadcasting it’s services 🙂
Hi Chris,
Thanks for the article, it’s very helpful.
We are using “urn:Belkin” services in this article and found the same in other blogs as well. My Question is can we use custom one like “urn:XYZ”.
The Echo is programmed to query the network for specific types of devices, and sends out a search for urn:Belkin. I don’t know what would happen if your code replied using a non-Belkin urn, but I suspect that the Echo would ignore it.
Hi Chris,
I googled about a lot and found your code – this has saved me many hours of tinkering with code.
Is there any way I can make a small donation (paypal for example) to let you buy a new toy/gadget.
-Jim
Jim,
That’s very generous, thank you! However, I’m not really set up to take donations. If you’re going to buy something from Amazon, getting to them by clicking one of the product links on my site could earn me a small commission. But it’s totally unnecessary and I’m glad you found the code to be helpful.
Chris,
This is amazing what you have done. I plan to implement some of it. It appears that everywhere fauxmo is used, it is only for switching and controlling the on/off state of devices. This will work great for a garage door, light switch, or any power type of application. What do you do about feedback from a sensor? Is there a way to read a sensor with fauxmo or could you recommend an approach to accomplish that? For example: I have a sump pump with a water level sensor as well as temperature, humidity, and door sensors that can tell me open/closed or numerical data related. I would appreciate your suggestions. Thanks for sharing your fauxmo software. It is very clever.
Rob,
The Echo is quite limited with what it can do for home control. It’s more of a voice-activated remote control than anything, with essentially no automation abilities. As you point out, it can’t easily retrieve sensor or state information.
To do what you want with Alexa, you’d have to create a skill, which talks to a server, which talks to your sensors. Having a server accessible via the internet with properly-configured SSL certificates, and which can access your sensors is a big job. Then, on top of that, the voice commands get a little clunky because you have to tell Alexa to ask your skill to get the values.
probably a dumb question, how do you get to set what gpio pin is used to turn the relay on or off?