The Curse of the Silent Failure
By: Larry Klug
--- 04/25/2010

No news is not always good news.

A large part of our job involves the constant care and feeding of our clients systems. From our most antiquated, old-school legacy systems to our most modern, cutting edge mobile web apps, all are constantly looked after by a whole host of automated noise-makers. Triggers, scheduled jobs, services, scripts and the like are all dutifully taking measurements, monitoring thresholds, predicting behaviors, logging, delivering alerts, sending notifications and promoting escalations, 24 hours a day, 7 days a week. These automated cry-babies notify us of their respective aches and pains using a variety of means; emails, phone calls, text messages, IRCs, IMs, tweets, beeps, you name it. Typically, our days are less noisy when our systems are performing at their heathiest.

These messages are the most effective when they bark at the strange and unusual. The more predictable their squawking becomes, the more likely they are to be overlooked. This poses an interesting puzzle, how does one account for the lack of noise due to a catastrophic failure? The dreaded silent failure.

One of our more delicate legacy systems sends out an email at 2:00am every day, broadcasting her intended task with the most unremarkable subject line of "Starting bigImport". At some point in the past 5 years, this message became invisible to all that receive it. This caused a recent silent failure to go un-noticed for much longer than it should, overlooked by an entire team of recipients, all equally hypnotized by the silence. How do you protect yourself from the dreaded silent failure?

Return to Blog