The Nest Learning Thermostat: Unsafe at Any Temperature?

David Young

7 October 2014

The following is my account of a serious failure of the Nest Learning Thermostat in my house in January 2014. I have also written down some lessons about software and society that I draw from the experience.

My wife and I have been using the Nest Learning Thermostat at our house in Urbana, Illinois, since shortly after we moved in (August 2013). It worked splendidly until January 2014, when Nest pushed a software update (4.0.0) that made our thermostat run down its batteries and stop heating the house. This created a dangerous and uncomfortable situation at our house during the extremely cold, snowy winter weather caused by the "arctic vortex".

Our house was not the only one affected by this Nest thermostat failure. On Nest's Facebook page and on its technical-support message boards, people all over the Midwest and Northeast reported their disappointment and desperation as temperatures plunged outdoors and indoors. One person fretted that the Nest at their second home in Maine had stopped responding. A homeowner in Detroit described how their Nest thermostat was letting them down with behavior resembling our own thermostat.

In print and online news, the Nest failures during the arctic vortex were scarcely reported—I have found just one article—and Google's purchase of Nest quickly eclipsed all other Nest news. Nest needs to own the problem and tell customers how it is going to improve its product and business practices to avoid creating unnecessary home-heating crises in the future. I have written this account so that the incident is not forgotten.

The Story

The Nest has been a pretty good thermostat for us. We find it helpful to be able to adjust the temperature from any room in the house, or from out of town, using our iPhones. The Nest automates some energy-saving strategies that are difficult or impossible to employ using a standard thermostat: it changes the temperature set-point when we leave the house, runs the circulating fan instead of the A/C compressor or the furnace to increase comfort, et cetera.

The Nest thermostat relies on contacting Nest.com servers to provide some of its useful functions, and it automatically fetches software updates from Nest. I knew when I installed the Nest thermostat that I was putting a lot of faith in Nest, who could affect the usefulness and usability of the thermostat at my house by pushing a software update to the thermostat that "improved" the user interface (thereby ruining the interface—you know how it goes), by turning off the servers that provide the monthly energy reports, or by going out of business. But I was won over by the sleek appearance and the many useful functions of the thermostat. I also was ready to try something new after using many crummy programmable thermostats. And we needed a new thermostat: the LCD display on our old programmable was failing. So I chose the Nest.

Monday, January 6, was a very cold day in Urbana. The low temperature that day was -14F. On that day, our thermostat was already displaying some odd behavior. Sometimes we would glance at it and there was something unfamiliar on the display: it had just rebooted. Sometimes a little tick-tick noise that it made as it restarted would grab our attention. The thermostat's network connection seemed intermittent. This was all a bit unnerving, but the house kept warm until bedtime.

When we got out of bed on the morning of January 7, the house was quite cold. After a while, a pattern was apparent: the thermostat would run its battery down to a critically low voltage and shut itself down. While the thermostat was shut down, the furnace would not run. Meanwhile the house would cool some more. The thermostat eventually would start back up and run the furnace for a while. Before the furnace had brought the temperature up very far, the battery would run down again. Then the cycle would repeat. As we struggled to restore the Nest thermostat to working order, the temperature inside the house plunged to below 50 degrees. My wife and I were quite uncomfortable. We were worried that pipes in the house would freeze. Our Boston terrier would not budge from the bed, where he curled pitifully under the covers.

Just in case you are not familiar with blizzards and ice storms, I will mention some of the risks that a broken thermostat during a winter storm will expose you to. Trips outdoors are unusually risky during bad winter weather, and they should be avoided. Tree limbs laden with snow and ice and stressed by high winds can fall and kill you outdoors. You and your car can end up in a collision or in a ditch when ice and packed snow make roads slippery. By resorting to the use of space heaters instead of central heating, you have to accept new hazards, like fires and burns. Property damage is a real danger. As the indoor temperature falls, water in pipes near your house's exterior will freeze, breaking pipes and fixtures, causing an uncontrolled flow in your home and costly water damage. Before the day was through, I had made a car trip on dangerous roads so that I could buy a micro-USB cable (to recharge the Nest) and a space heater for our dining room. I wouldn't have run the risk of taking the trip or running a space heater if I could have helped it.

On the advice of Nest technical support, I recharged our Nest thermostat by plugging it into my laptop computer with the micro-USB cable I fetched from a store. (Nest did not include a micro-USB cable in the box—they should have.) Then I turned off the Wi-Fi on the thermostat to conserve power. That stopped the thermostat from resuming the cycle where it ran down its battery and shut down to recharge.

A Catch-22 situation was soon apparent: Nest could fix the thermostat by installing the previous software version, 3.5.3, but our thermostat had to be connected to the Internet via Wi-Fi for them to do it. Reactivating the Wi-Fi would reactivate the trouble, and the update would not be available right away, so the shutdown/restart cycle would resume. In the end, I coordinated with Nest technical support so that the thermostat could download and start running 3.5.3 shortly after the reactivation of Wi-Fi.

The Catch-22 reveals a basic design flaw in the Nest thermostat: there is no way for the owner of Nest thermostat, on their own, to roll back to the previous software version if there is a problem with the latest update. They need the help of technical support, they need a Wi-Fi connection and an Internet connection, and their thermostat needs to keep running long enough that it can download and install the previous version. If the thermostat cannot use Wi-Fi or the Internet because of a defect in the latest update or other circumstances, such as a fallen utility pole, then the owner is up a creek.

Lessons

There are lessons to be learned from this incident. Lesson #1: our society may have a blind spot for technological dangers where agency and intelligence are not evident. Where the Internet of Things (things like thermostats) and cloud computing are concerned, dangers may crowd into that blind spot. Stories about nations and political activists and terrorists using worms, viruses, and distributed denial-of-service attacks are all of a piece with the daily news, TV and movie plots, literature and history, where one person or group cunningly gets the upper hand on another for some nefarious reason. Where machines are concerned, science fiction is full of fictional machines that deviate from their original mission and behave with independence and intelligence: Skynet, Colossus, HAL 9000, the computer in WarGames, and the fracking Cylons. If and when self-aware machines ever menace us, we will shiver with recognition. But machines without human-like thought and purpose don't figure in as many memorable crises, real or imagined, as the smart and purpose-driven ones.

If hackers had hacked Nest thermostats and turned off the heat in thousands of U.S. homes during a cold snap, then everyone would get that. Likewise, if the Nest thermostats had turned on their masters and mischievously turned off the heat, then that would have confirmed our worst nightmares. But a software update centrally dispatched to thousands of thermostats with a defect that blithely stops the heat just doesn't have as many recognizable pop-culture parallels as the other threats. Avoidable mistakes and predictable accidents don't register as serious dangers in the way that offensive actions do.

Lesson #2: centralizing control in a networked system makes the system brittle. In a resilient networked system, individual network nodes are capable of autonomous action and fault recovery. A networked thermostat should be able to recover from a dodgy software update without help from a network hub. The Nest thermostat- / cloud-complex is shot through with brittleness. Case in point: it is handy to be able to adjust the thermostat set point at bedtime from the bedroom using one's iPhone, instead of padding downstairs in sock feet to rotate the thermostat's bezel. Remote control of the temperature is probably the most-used feature of the Nest at my house. With a loss of the Internet connection, Nest remote control dissolves, even if the household Wi-Fi network is up and running. The Nest thermostat is the only product in my networked home that does not form an autonomous networked unit with the other devices at home: the computers, the printer, Wi-Fi access points, smart phones.

Lesson #3: it is difficult and very unusual to develop performance envelopes for a software product, but perhaps that should be part of the software-engineering routine. In aeronautical engineering, a performance envelope is a diagram that tells the safe combinations of speed and altitude for a particular airplane in straight and level flight. It answers questions about an airplane such as, how slow can it fly at 3,000 feet without stalling? How high can it fly at Mach 1? Extrapolating from aeronautical engineering to software engineering, a performance envelope may answer questions about a product under software control such as, how long can the thermostat run the thermostat function on the CPU and meanwhile transmit on Wi-Fi with a .1% duty cycle if the battery is 75% depleted and not charging? A performance envelope developed with user needs and priorities in mind (for example: regulate the temperature at all costs in winter, but operate the display and activate Wi-Fi only if there is enough charge to continue to regulate the temperature during an extended vacation or Internet outage—for three weeks, say), may be an important guide for the software-development process. Perhaps the performance envelope could be incorporated into the operating system as the parameters for a resource scheduler.

If software engineers cannot more rigorously and transparently specify the performance of their products, and then meet their specifications with a high degree of confidence, then perhaps we should leave some problems (like HVAC control) for the mechanical/civil/aeronautical/electrical engineers to solve.

Related to Lesson #2 and Lesson #3, Lesson #4 is this: many problems of security in software-based or Internet-connected products should probably be re-framed as problems of resilience, visibility, and control. I think that the term cybersecurity is evocative of agency and intelligence, arms and warfare, and it provokes the digital equivalent of wall- and moat-building. A digital wall or a moat still would have let a broken software update in, and it would not have kept the house warm. On the other hand, if a product is resilient, then performance degrades gracefully, and there are avenues for recovery, no matter whether there is an external attack or an internal defect. I use the term visibility similarly to some of the books on human factors, to mean that a product's state is shown plainly to the user. One can easily read from the Nest thermostat whether the furnace or A/C is on, and the temperature set points, but there ought to be the equivalent of an automobile's Check Engine light for conditions like excessive draw on the battery. By control, I mean that the user should have at least enough control to accept/reject new conditions affecting their safety—for example, enough control to accept/reject a software update and to roll back a update.

There have not been any other major upsets of the Nest thermostat at our house, but winter is coming soon, and Nest has not produced any reassurances that a repeat of the arctic vortex event cannot occur. We will have to decide soon whether or not to replace the thermostat with a more conventional unit.