Amazon Web Services Outage: Slack, Asana, Docker, and More Affected


An Amazon Web Services (AWS) outage has caused widespread disruption for internet services – with Slack, Asana, and Docker all apparently affected. User reports from outage tracking website Downdetector indicate issues in the US-East-1 region – with EU-West-1 also seemingly affected – and problems impacting multiple services that depend on AWS infrastructure.Ā 

Salesforce-owned Slack appears to have been affected by the outage, along with work management platform Asana and developer assistant tool Docker. Let’s take a look at what happened.Ā 

AWS Outage: Full Timeline

Shortly after midnight Pacific Daylight Time (PDT), Amazon posted a health service update saying they were investigating increased error rates and latencies for ā€œmultiple AWS services in the US-EAST-1 Regionā€.Ā 

Outage-tracking website Downdetector reveals thousands of outages reported over a period of several hours (BST time). Credit: DowndetectorĀ 

The disruption appeared to involve DynamoDB, one of AWS’s core infrastructure services.

Two hours after the initial post, AWS revealed that they had identified a potential root cause for error rates for the DynamoDB APIs in the US-EAST-1 Region, and the issue appears to be related to DNS resolution of the DynamoDB API endpoint in US-EAST-1.Ā 

Pictured: User reports from outage tracking website Downdetector indicate issues in the US-East-1 region – with EU-West-1 also seemingly affected – and problems impacting multiple services that depend on AWS infrastructure. Credit: Downdetector

At 2:27 AM, AWS said they were seeing signs of recovery.Ā 

An hour later, they said that the underlying DNS issue had been ā€œfully mitigatedā€, and most AWS Service operations were now succeeding normally.Ā 

See the full list of AWS health service updates below (all times PDT):Ā 

  • 12:11 AM: ā€œWe are investigating increased error rates and latencies for multiple AWS services in the US-EAST-1 Region. We will provide another update in the next 30-45 minutes.ā€
  • 12:51 AM: ā€œWe can confirm increased error rates and latencies for multiple AWS Services in the US-EAST-1 Region. This issue may also be affecting Case Creation through the AWS Support Center or the Support API. We are actively engaged and working to both mitigate the issue and understand the root cause. We will provide an update in 45 minutes, or sooner if we have additional information to share.ā€
  • 1:26 AM: ā€œWe can confirm significant error rates for requests made to the DynamoDB endpoint in the US-EAST-1 Region. This issue also affects other AWS Services in the US-EAST-1 Region as well. During this time, customers may be unable to create or update Support Cases. Engineers were immediately engaged and are actively working on both mitigating the issue, and fully understanding the root cause. We will continue to provide updates as we have more information to share, or by 2:00 AM.ā€
  • 2:01 AM: ā€œWe have identified a potential root cause for error rates for the DynamoDB APIs in the US-EAST-1 Region. Based on our investigation, the issue appears to be related to DNS resolution of the DynamoDB API endpoint in US-EAST-1. We are working on multiple parallel paths to accelerate recovery. This issue also affects other AWS Services in the US-EAST-1 Region. Global services or features that rely on US-EAST-1 endpoints such as IAM updates and DynamoDB Global tables may also be experiencing issues. During this time, customers may be unable to create or update Support Cases. We recommend customers continue to retry any failed requests. We will continue to provide updates as we have more information to share, or by 2:45 AM.ā€
  • 2:22 AM: ā€œWe have applied initial mitigations and we are observing early signs of recovery for some impacted AWS Services. During this time, requests may continue to fail as we work toward full resolution. We recommend customers retry failed requests. While requests begin succeeding, there may be additional latency and some services will have a backlog of work to work through, which may take additional time to fully process. We will continue to provide updates as we have more information to share, or by 3:15 AM.ā€
  • 2:27 AM: ā€œWe are seeing significant signs of recovery. Most requests should now be succeeding. We continue to work through a backlog of queued requests. We will continue to provide additional information.ā€
  • 3:03 AM: ā€œWe continue to observe recovery across most of the affected AWS Services. We can confirm global services and features that rely on US-EAST-1 have also recovered. We continue to work towards full resolution and will provide updates as we have more information to share.ā€
  • 3:35 AM: ā€œThe underlying DNS issue has been fully mitigated, and most AWS Service operations are succeeding normally now. Some requests may be throttled while we work toward full resolution. Additionally, some services are continuing to work through a backlog of events, such as Cloudtrail and Lambda. While most operations are recovered, requests to launch new EC2 instances (or services that launch EC2 instances such as ECS) in the US-EAST-1 Region are still experiencing increased error rates. We continue to work toward full resolution. If you are still experiencing an issue resolving the DynamoDB service endpoints in US-EAST-1, we recommend flushing your DNS caches. We will provide an update by 4:15 AM, or sooner if we have additional information to share.ā€
  • 4:08 AM: ā€œWe are continuing to work towards full recovery for EC2 launch errors (which may manifest as an Insufficient Capacity Error). Additionally, we continue to work toward mitigation for elevated polling delays for Lambda, specifically for Lambda Event Source Mappings for SQS. We will provide an update by 5:00 AM PDT.ā€

Summary

An issue with Amazon Web Services (AWS) in the US-EAST-1 Region affected a number of online services, but the issue appears to have been largely resolved.Ā 

This is a live post and will be updated.Ā 

Have you been affected by the outage? Email tips@salesforceben.com

Leave a Reply

Your email address will not be published. Required fields are marked *