The Cloud Goes Down Sometimes

For two to three hours this afternoon, my employer wasn’t receiving outside email. We use Microsoft 365, and by mid-afternoon, the DNS lookup for our mail protection gateway was simply failing. Emails bounced back to senders with “unable to look up host” errors. At Downdetector, Microsoft 365 was the number one reported issue, with complaints spiking to around 16,000 reports at 3 PM Eastern.

Microsoft acknowledged the problem on X around 3:17 PM ET, stating that “a portion of service infrastructure in North America” wasn’t handling traffic correctly. By the time I left for the day, things appeared to be recovering—but we won’t know for certain until tomorrow morning whether everything is fully resolved.

This is the second M365 disruption in two days. Yesterday, a third-party networking issue briefly knocked out Teams and Outlook for thousands of users. Today’s problem appears to be internal to Microsoft’s own infrastructure.

You’re Not Alone

If you’ve been paying attention, you know this isn’t unique to Microsoft. The last twelve months have been rough for cloud reliability across the board.

January 2026: Just last week, Verizon’s cellular network went down for six hours, leaving customers across the country showing “SOS” on their phones. Emergency alert systems in D.C. and New York had to warn residents to find alternative ways to reach 911. Verizon blamed a software issue and offered $20 credits.

October 2025: AWS experienced its worst outage in years—15 hours of disruption traced to a DNS resolution failure in the US-EAST-1 region. A single DNS problem paralyzed over 3,500 companies across 60+ countries. Snapchat alone received 3 million outage reports. WhatsApp, DoorDash, Disney+, and McDonald’s were all affected.

October 2025: Azure Front Door, Microsoft’s global CDN, failed for more than eight hours, taking down Microsoft 365, the Azure Portal, and enterprise customers including Alaska Airlines.

July 2025: A networking configuration change in Azure’s East US2 region caused connectivity issues and resource allocation failures that lasted roughly 50 hours.

June 2025: Google Cloud went down for over seven hours after a policy change triggered a null-pointer crash loop. Gmail, Docs, Drive, Maps, and Gemini were all affected. Spotify, Snapchat, and Discord went down with them.

The Math Still Works

Here’s the thing: despite all of this, I still think the cloud is worth it.

Before cloud services, I lived through on-premises server failures, SANs that kept us up all night, and hardware failures where we waited days for parts. I once walked into a computer room in the Sears Tower after an HVAC failure to find servers shutting themselves down from thermal overload. Those incidents weren’t reported on Downdetector because they only affected one company—but they happened routinely, everywhere.

With the cloud, you’re trading rare, high-profile outages that affect everyone for the elimination of the constant, low-level failures that used to be part of daily IT life. Microsoft, AWS, and Google have armies of engineers, redundant infrastructure, and monitoring capabilities that no single organization can match. When they fail, it makes the news. When your on-premises Exchange server failed, it just made your Thursday worse.

Is it frustrating when email goes down for a few hours? Absolutely. But if you’re running your own infrastructure, you’re almost certainly experiencing more total downtime—it’s just distributed across smaller incidents that don’t make headlines.

My rule of thumb: expect a couple of hours of unexpected downtime per year from any major cloud service. If that’s unacceptable for your business, you need to architect around it—secondary providers, hybrid setups, or truly critical systems that can operate offline. But for most organizations, that tradeoff is still far better than the alternative.

The cloud goes down sometimes. So does a server room. The difference is that when the cloud comes back up, we don’t have to fix it ourselves.

You’re Not Alone

The Math Still Works

Security Scorecard