Amazon Web Services Says Outage Has Been Resolved

UPDATED with latest: The Amazon Web Services outage that, at one point, saw Amazon Music, Prime Video, Alexa, Signal, WhatsApp, Snapchat, IMDb, Disney+, Fortnite, Roblox and Wordle down or suffering slowdowns, has been resolved.

In a post to its support page at 3:53 p.m. PT, AWS announced that, by 3:01 p.m., “all AWS services [had] returned to normal operations. Some services such as AWS Config, Redshift, and Connect continue to have a backlog of messages that they will finish processing over the next few hours.”

Oct 20 3:53 PM PDT Between 11:49 PM PDT on October 19 and 2:24 AM PDT on October 20, we experienced increased error rates and latencies for AWS Services in the US-EAST-1 Region. Additionally, services or features that rely on US-EAST-1 endpoints such as IAM and DynamoDB Global Tables also experienced issues during this time. At 12:26 AM on October 20, we identified the trigger of the event as DNS resolution issues for the regional DynamoDB service endpoints. After resolving the DynamoDB DNS issue at 2:24 AM, services began recovering but we had a subsequent impairment in the internal subsystem of EC2 that is responsible for launching EC2 instances due its dependency on DynamoDB. As we continued to work through EC2 instance launch impairments, Network Load Balancer health checks also became impaired, resulting in network connectivity issues in multiple services such as Lambda, DynamoDB, and CloudWatch. We recovered the Network Load Balancer health checks at 9:38 AM. As part of the recovery effort, we temporarily throttled some operations such as EC2 instance launches, processing of SQS queues via Lambda Event Source Mappings, and asynchronous Lambda invocations. Over time we reduced throttling of operations and worked in parallel to resolve network connectivity issues until the services fully recovered. By 3:01 PM, all AWS services returned to normal operations. Some services such as AWS Config, Redshift, and Connect continue to have a backlog of messages that they will finish processing over the next few hours. We will share a detailed AWS post-event summary.

PREVIOUSLY at 2:48 p.m. PT: Access is has returned to many sites that were interrupted this morning by a massive Amazon Web Services outage.

AWS indicated in a post that, as of 2:48 p.m. PT, throttling of traffic had been restored to “pre-event” levels. “There is a backlog of analytics and reporting data that we must process and anticipate that we will have worked through the backlog over the next two hours,” the statement read.

Read it in full below.

Oct 20 2:48 PM PDT We have restored EC2 instance launch throttles to pre-event levels and EC2 launch failures have recovered across all Availability Zones in the US-EAST-1 Regions. AWS services which rely on EC2 instance launches such as Redshift are working through their backlog of EC2 instance launches successfully and we anticipate full recovery of the backlog over the next two hours. We can confirm that Connect is handling new voice and chat sessions normally. There is a backlog of analytics and reporting data that we must process and anticipate that we will have worked through the backlog over the next two hours. We will provide an update by 3:30 PM PDT.

PREVIOUSLY at 8:10 a.m.: Hundreds of websites and apps that went offline temporarily in North America and Europe on Monday morning following an outage at an Amazon Web Services data center in North Virginia appeared to be functioning as normal as the U.S. woke up.

Earlier in the day, Amazon Services such as Amazon Music, Prime Video and Amazon Alexa went down or suffered slowdowns, alongside messaging apps such as Signal, WhatsApp and Snapchat as well as entertainment and game sites including IMDb, Disney+, Fortnite, Roblox and Wordle among many.

In its latest update at 7.29 a.m. PT related to the data center outage, AWS, which provides cloud services to thousands of businesses worldwide, said the situation was improving but gave no reason for the cause of the problem.

“We have confirmed multiple AWS services experienced network connectivity issues in the US-EAST-1 Region. We are seeing early signs of recovery for the connectivity issues and are continuing to investigate the root cause,” read the message.

In the backdrop, e-commerce experts said that the resumption of connectivity was just one part of the story, warning that Monday morning’s outage would likely have longer-term consequences for businesses operating on the web.

“When AWS sneezes, half the internet catches the flu. Outages like this cause frustrated users but also trigger a domino effect across payment flows,” said Monica Eaton, founder and CEO of e-commerce services provider Chargebacks911 and Fi911.

“What I expect now is a spike in ‘I never got my service’ or ‘I was charged twice’ claims. Many of those won’t be fraud, just confusion. But confusion is the No. 1 driver of chargebacks. If merchants sit back and wait for disputes to roll in, they will bleed revenue unnecessarily.”

She urged online businesses to get ahead of the curve by running duplicate charge sweeps, pushing out proactive notifications to affected users and issuing prompt refunds to impacted customers.

“The outage will end long before the disputes do. Any business that treats this as a one-day incident is already behind. Downtime happens, but silence and slow responses are what cause real damage.”

Ismael Wrixen, CEO of digital commerce platform ThriveCart, said the outage should serve a wake-up call for all operators in the space.

“Today’s outage isn’t just an ‘East Coast AWS’ problem; it’s a reminder that 100% uptime is a myth for everyone. The internet runs on shared infrastructure,” he said.

“The real story isn’t just that AWS had a critical issue, but how many businesses discovered their platform partner had no plan for it, especially outside of U.S. hours. This is a harsh wake-up call about the critical need for multi-regional redundancy and intelligent architecture.”

PREVIOUSLY, 1:50 a.m.: Hundreds of online services went temporarily offline across North America and Europe on Monday amid suggestions that the problem was linked to an outage at an Amazon Web Services data center in North Virginia.

Alongside Amazon services such as Amazon Music, Prime Video and Amazon Alexa a slew of other websites and apps also suffered problems, including Signal, Life360, Roblox, Zoom, Fortnight, IMDb, and Disney+ among many.

The issue appeared to be linked to a severe outage at an AWS facility in North Virginia, but it has yet to be confirmed if this is the sole cause of wider issues, which were reported both in North America and Europe.

AWS first reporting “increased error rates and latencies for multiple AWS services” just after midnight PT, with its latest message at 1.26 a.m. PT, saying the issue was ongoing.

“We can confirm significant error rates for requests made to the DynamoDB endpoint in the US-EAST-1 Region,” read the message.

“This issue also affects other AWS Services in the US-EAST-1 Region as well. During this time, customers may be unable to create or update Support Cases. Engineers were immediately engaged and are actively working on both mitigating the issue, and fully understanding the root cause. We will continue to provide updates as we have more information to share, or by 2:00 AM.”

Aravind Srinivas, CEO of Perplexity AI, was one of the first to point to the AWS outage at the cause of the problem, with a post on X saying: “Perplexity is down right now. The root cause is an AWS issue. We’re working on resolving it.”

The issue first started hitting the news as people woke up in Europe to discover a number of websites and apps were not working.

Monitoring sites such as Downdetector started to report a spike in issues for multiple sites and apps at around 8 a.m. UK time (midnight PT).

Beyond entertainment and media sites, a number of banking sites were also impacted including the UK’s Bank of Scotland, Lloyds Bank and Halifax as well as cryptocurrency site Coinbase. The latter put out a statement saying all funds were safe.

The incident has echoes of an outage at cybersecurity company CrowdStrike in July 2024, which impacted 8.5 million Windows devices worldwide, disrupting airlines, banks and government services at an estimated cost of $10 billion overall.

The exact cause of the outage has yet to be ascertained.

Source link