Improving IoT System Robustness Using Watchdog Timers

Contributed By DigiKey's North American Editors

2016-12-29

While security is a necessary emphasis when designing for the Internet of Things (IoT), what’s often overlooked is the importance of developing systems that are robust enough to recover from failure without human intervention. To ensure this robustness, designers should look closely at the humble watchdog timer (WDT), which now comes in a variety of forms ranging from simple timers to smart, integrated watchdogs.

This article will re-visit the fundamentals of internal and external WDTs before introducing some of the latest WDT devices and how to use them to ensure system robustness.

Why watchdog timers are critical for the IoT

With billions of IoT devices being deployed in the field, it would be impossible for a technician to service them in a timely manner if something goes wrong. As a result, IoT systems must be able to detect and recover from faults on their own without any human intervention.

Watchdogs come in many different shapes and sizes, but can generally be categorized as simple timers, windowed timers and smart watchdogs. Watchdogs may exist internally to microcontrollers as hardware and software, externally as hardware, and even as separate microcontrollers with both hardware and software components. At the end of the day, no matter which watchdog solution is used, the sole purpose is to monitor and recover the system. To this end, each watchdog has its own unique characteristics and design challenges that developers need to consider for a robust IoT system design.

Internal watchdog fundamentals

Internal watchdog timers are hardware peripherals that are included in nearly every single microcontroller and can interact with the onboard peripherals and system clock (Figure 1). By default, the internal watchdog timer is disabled and requires a developer to set a predefined period before enabling the watchdog. If the software locks up, or there is a hardware fault that affects the software’s execution, the watchdog timer will expire and force the microcontroller to reset. In the process, it will clear the error and allow the microcontroller to reinitialize the system.

Diagram of Texas Instruments’ MSP430G2210 watchdog timer

Figure 1: Watchdog timers are included in most microcontrollers, such as Texas Instruments’ MSP430G2210, and can reset the processor when software locks up. (Image: Texas Instruments)

Internal watchdog timers, while quite simple in theory, require a fair amount of thought to implement properly. For example, the software that is developed for the watchdog cannot simply clear the watchdog timer blindly. The software should perform a check on the system to ensure that all tasks and hardware are operating properly before clearing the watchdog.

When developing an internal watchdog solution, there are several tips that developers should attempt to adhere to:

Never disable the watchdog for any reason. In fact, in selecting a microcontroller, make sure that the watchdog once enabled can never be disabled.
Never clear the watchdog in a periodic interrupt independent from software functionality checks.
Verify that the watchdog timer is an independent watchdog. Independent watchdogs have a separate clock that allows them to detect if the system clock has halted.
Use a watchdog that has a windowed watchdog feature. These watchdogs require a minimum time before the watchdog can be cleared. If an attempt is made prior to the start of the window, the watchdog will reset the system. This prevents runaway software from overriding the watchdog timer.

Internal watchdogs are a good step towards building a robust embedded system, but on their own they don’t provide a very robust solution. In order to really up the ante with respect to robustness, developers need to consider external watchdogs.

Increase robustness with external watchdogs

No matter how careful developers are in their internal watchdog implementation, internal watchdogs can’t always save the day. Many implementations have flaws, two examples of which are sharing the system clock, and having a disable option.

When a system needs to operate on its own in the field, using an external watchdog has many advantages, such as:

Performing a hard system reset that ensures the microcontroller is power cycled, which in turn power cycles the internal peripherals.
Separating the watchdog from the microcontrollers oscillator circuit.
Providing a completely independent process for monitoring the system.

All of these contribute to system robustness, although there are also a few disadvantages to using an external WDT. These include an increase in hardware costs due to the addition of an IC as well as an increase in system complexity. However, as we will see, these are minor disadvantages when all things are considered. Let’s examine how we might develop a simple, robust external watchdog circuit (Figure 2).

Diagram of example external watchdog circuit

Figure 2: Example external watchdog circuit that is monitoring the behavior and state of a microcontroller, which in turn has its own internal watchdog timer. (Diagram drawn using DigiKey Scheme-it®)

The circuit consists of a microcontroller that would be running its own internal watchdog timer, in addition to an external watchdog circuit. In this example, the watchdog circuit is a Texas Instruments TPL5010 Nano-power System Timer with Watchdog Function. The external watchdog has an output reset pin that connects directly to the microcontroller reset pin. When the WAKE pin is toggled, the TPL5010 expects the microcontroller to respond by issuing a heart-beat on the DONE pin. If the microcontroller does not respond, the reset line is pulled low to reset the microcontroller. The watchdog period is set by adjusting R2.

Adding a simple and low cost circuit like this to a design is an excellent path to improving system robustness. Developers don’t even need to wait for a board to be spun to start testing. The TPL5010 development kit can easily be set up with other development kits to test watchdog functionality, long before hardware is available (Figure 3).

Image of Texas Instruments TPL5010 development board

Figure 3: The Texas Instruments TPL5010 development board costs less than $30 and provides headers to not only connect a microcontroller for testing, but also for configuring and measuring the current consumption for the TPL5010. (Image: Texas Instruments)

When selecting an external watchdog, there are several factors that a developer needs to consider such as:

Minimum and maximum timeout periods
Window watchdog support
Current consumption
Minimizing pin count
Potential failure modes (if any exist)

Designing a smart watchdog solution

The ultimate watchdog for an IoT device is a smart watchdog. A smart watchdog is a supervisory microcontroller that, in addition to performing basic heartbeat monitoring, can also monitor system communications (Figure 4). There can be instances where the microcontroller stops responding to the Internet, but is still successfully clearing the external watchdog. When this happens, a command could be sent over the Internet to reset the microcontroller. The smart watchdog can monitor the communication lines, such as UART transmit and receive lines, for a special command that tells it to restart the system.

In the example, a communication module is connected to both the microcontroller and the smart watchdog. Notice that the Smart Watchdog also has an external TPL5010. The reason is that the smart watchdog is a microcontroller that runs software, and in order to be robust, should have its own external watchdog.

Diagram of example smart watchdog system architecture

Figure 4: Example smart watchdog system architecture. (Diagram drawn using DigiKey Scheme-it)

When designing a smart watchdog, developers need to consider several important factors such as:

Heartbeat characteristics
Input/output availability
Cost
Available flash memory
Energy consumption
Failure modes
Minimizing physical footprint

There are several microcontrollers available today that could serve as good smart watchdogs. First is the Texas Instruments MSP430G2xx with 2 kB of flash and 4 I/O lines. This microcontroller has just enough coding space and enough pins to develop a very simple smart watchdog implementation.

For applications that require communication monitoring, a few more I/O lines and additional memory can be quite useful. In such instances, the MSP430G2231IPW14R or the MSP430G2553IPW20R would make good candidates.

That said, it’s the smart watchdog software that truly makes the watchdog “smart” and that’s where developers need to pay attention and create a little bit of code. Be aware that the code doesn’t have to be complicated, in fact, the simpler the better! Small, simple, provable software is best for a watchdog. A simple code example for the MSP430, inspired by the TPL5010 Evaluation Module documentation, is shown (Code listing):

static volatile bool Reset = true;

void main(void)

{

WDTCTL = WDTPW | WDTHOLD; // Stop watchdog timer

P1OUT |= BIT0; // Set P1.0 to high

P1DIR |= BIT0; // Set P1.0 to output direction

P1DIR &= ~BIT1; // Set P1.1 to input direction

P2IES &= ~BIT0; // P2.0 Lo/Hi edge

P2IFG &= ~BIT0; // P2.0 IFG Cleared

P2IE |= BIT0; // P2.0 Interrupt Enabled

while(1)

{

__delay_cycles(500000); // Set Delay

// If true then the heartbeat was not received

if(Reset == true)

{

// The heartbeat was not received. Reset the processor

P1OUT &=~ BIT0;

__delay_cycles(100); // Set Delay

P1OUT |= BIT0;

Reset = false;

}

else

{

Reset = true;

}

// Port 2 interrupt service routine

#pragma vector=PORT2_VECTOR

__interrupt void Port_2(void)

{

P2IFG &= ~BIT0; // P2.0 IFG Cleared

P2IE |= BIT0; // P2.0 Interrupt Enabled

Reset = false;

}

Code listing: Example MSP430 software to monitor the microcontroller heartbeat. If the heartbeat is not received within the expected period, then the microcontroller is reset. (Source: Inspired by Texas Instruments’ SNAU173 Application Note).

The example code shows the smart watchdog being initialized and then waiting a specific time before checking to see if the microcontroller should be restarted. The smart watchdog sets the Reset variable to true, assuming that the microcontroller has failed. It is up to the microcontroller to send the heartbeat pulse, which triggers the interrupt that sets the Reset variable back to false. There are plenty of features that could be added to the sample code such as:

Windowed watchdog
Entering low power states
Clearing the TPL5010 that is watching the smart watchdog
Monitoring communication lines
Tracking how many resets have occurred

The possibilities are limited only by what is truly necessary to ensure a robust design.

Conclusion

Security for the IoT is important, but developers would do well to not lose sight of system robustness. For this, watchdogs are going to play a critical role in the IoT as devices are not only deployed into the field far from human access, but are also expected to operate nearly flawlessly 24 hours a day, 7 days a week. Internal watchdogs provide developers with minimal recovery opportunities, while external watchdogs and smart watchdogs with a bit of extra software, open the field to robust and recoverable designs.

Disclaimer: The opinions, beliefs, and viewpoints expressed by the various authors and/or forum participants on this website do not necessarily reflect the opinions, beliefs, and viewpoints of DigiKey or official policies of DigiKey.