When Tech Fails, It Is Usually With a Whimper Instead of a Bang

While in some corners of Silicon Valley people worry about the risks of A.I., a simple failed software update caused a worldwide outage.

For a couple of years now, the artificial intelligence community has been warning that there is a chance their work will go south and humanity will end in a conflagration worthy of a superhero movie.

Friday brought a pointed reminder that disaster is at least as likely to creep in quietly, perhaps from a piece of technology so mundane that hardly anyone knows it exists.

Our lives are built on systems piled on systems. As we board airplanes, cross bridges, pay bills, download updates, track our children at camp and generally try to make it through the day, we take them for granted.

Until they fail.

This week’s global software outage, immediately proclaimed as the biggest in history, was not caused by terrorists or A.I. or rogue hackers demanding billions in ransom. It wasn’t even done as a lark by some off-the-charts smart teenager. Those are the Hollywood versions. Instead, it was a routine upgrade that somehow went off the rails.

CrowdStrike, a Texas company, specializes in protecting corporate clients from cyberthreats. It has been very successful at this. This time, though, the threat came from CrowdStrike itself, a problem for which it seemed unprepared.

The trouble began with a small Windows software update CrowdStrike sent to its customers on Thursday night. For some reason, this crashed every computer it touched. “Your PC ran into a problem,” users were cheerily informed. “It looks like Windows didn’t load correctly,” messages announced. The backdrop was the color of a perfect sky, also known as the Blue Screen of Death.

Any system can fail, and usually in unexpected ways. The Great Blackout of 1965, another contender for the greatest technology stumble of all time, shut off the electrical grid for 30 million people on the Eastern Seaboard. Silicon Valley couldn’t be blamed because Silicon Valley barely existed, but the culprit — a bad relay at a Canadian power station that caused a cascade of issues that broke the system — was equally mundane.

Living in the modern world is an act of faith. Most of the time we don’t think about it. Then the airplane we’re on shakes with turbulence. Or we read about how a door blew off. Or how planes crashed. Or — and this happened to people on thousands of flights on Friday — we can’t even get on the plane. It was worldwide pandemonium.

Planes are for obvious reasons a central theater of anxiety when technology is having a breakdown. But even those who weren’t trying to travel were upset on Friday. The computers couldn’t manage to get out of the passive voice to assign responsibility for their collapse, much less fix themselves, and the humans, at least initially, were not much better.

CrowdStrike, a Texas company, specializes in protecting corporate clients from cyberthreats. It has been very successful at this. This time, though, the threat came from CrowdStrike itself, a problem for which it seemed unprepared.

The trouble began with a small Windows software update CrowdStrike sent to its customers on Thursday night. For some reason, this crashed every computer it touched. “Your PC ran into a problem,” users were cheerily informed. “It looks like Windows didn’t load correctly,” messages announced. The backdrop was the color of a perfect sky, also known as the Blue Screen of Death.

Any system can fail, and usually in unexpected ways. The Great Blackout of 1965, another contender for the greatest technology stumble of all time, shut off the electrical grid for 30 million people on the Eastern Seaboard. Silicon Valley couldn’t be blamed because Silicon Valley barely existed, but the culprit — a bad relay at a Canadian power station that caused a cascade of issues that broke the system — was equally mundane.

Living in the modern world is an act of faith. Most of the time we don’t think about it. Then the airplane we’re on shakes with turbulence. Or we read about how a door blew off. Or how planes crashed. Or — and this happened to people on thousands of flights on Friday — we can’t even get on the plane. It was worldwide pandemonium.

Planes are for obvious reasons a central theater of anxiety when technology is having a breakdown. But even those who weren’t trying to travel were upset on Friday. The computers couldn’t manage to get out of the passive voice to assign responsibility for their collapse, much less fix themselves, and the humans, at least initially, were not much better.

“It’s a mess,” Brody Nisbet, an executive at CrowdStrike, wrote on X as he suggested a possible workaround. “I’ve no further actionable help to provide at the minute.” He added a disappointed face emoji: 😞.

The message was later deleted.

CrowdStrike likely failed to do its due diligence, programmers said. Trying the patch out on a variety of Windows machines before sending it out to customers could have helped detect the issue.

“They should have had a test machine to emulate some of their clients’ old boxes and they would have seen the Blue Screen of Death,” said Matt Mitchell, a hacker and founder of CryptoHarlem, a cybersecurity education and advocacy organization.

CrowdStrike is not some tiny start-up. Founded in 2011, it has 8,000 employees and a stock market valuation that was heading to $100 billion, at least before the outage caused some investors to jump ship. CrowdStrike shares closed down 11 percent Friday.

If the company doesn’t have the name recognition of some bigger tech firms, it has its share of arrogance. A portion of its website is devoted to trash-talking its competitors. “Microsoft’s security products can’t even protect Microsoft. How can they protect you?” CrowdStrike asks. Avoid Palo Alto Networks, it demands: “Don’t settle for a high-cost platform that’s hard to use, hard to deploy, and hard to manage.”

A message Friday from George Kurtz, the chief executive, seemed to minimize the outage, calling it “a defect found in a single content update for Windows hosts.” People complained that Mr. Kurtz was slow to offer an apology. (Hours later, he said, “I want to sincerely apologize directly to all of you for today’s outage.”) CrowdStrike did not respond to a request for further comment.

IT workers at affected companies were faced with a choice: walk around to each offline machine and remove the bit of flawed code, or wait and hope for a solution from CrowdStrike.

“The workaround works if you can walk to every laptop, type on the keyboard, and reboot it manually,” said Mikko Hypponen, a security expert and chief research officer at WithSecure, a cybersecurity company. “The problem that this poses is that normally large enterprises, which is what CrowdStrike customers are, maintain their fleet” with centralized controls.

In other words, the traditional way to fix a balky computer — turning it off and then turning it on again — was still the only solution, even as the computers themselves are now increasingly woven into worldwide networks. But the travelers trapped at the airport could not reboot those screens that were preventing them from flying.

What Mr. Kurtz called “a defect found in a single content update” is a modern-day threat. Only a few years ago, software updates were more complicated, more tedious. Every computer system was not linked to every other system, which meant failures were more contained.

“When it comes to cybersecurity, we talk about defense in depth — having a moat and then archers and a gate around the castle. We talk about having it set up where there is no single point of failure. But we are creating a situation where there is a single point of failure,” said Mr. Mitchell, the hacker.

People took the 1965 blackout in stride. The CrowdStrike outage disrupted but it has not yet been linked to any deaths. People have the weekend to complete their interrupted journeys. If CrowdStrike is lucky, the trouble will be forgotten within days if not hours.

Some day, though, the rest of us may not be so lucky, and some piece of boring technology — overloaded, neglected or poorly installed — will cause a genuine disaster. A software breakdown that causes a societal breakdown is probably better odds than A.I. bringing about world peace. The more networked the world gets, the greater the danger. It would be a stupid way to go, as the poets anticipated long ago. “This is the way the world ends/ Not with a bang but a whimper,” wrote T.S. Eliot. These days, of course, he would add a thumbs-down emoji.

Leave a Reply

Your email address will not be published. Required fields are marked *