Amazon holds engineering meeting following AI-related outages
Ecommerce giant says there has been a ‘trend of incidents’ linked to ‘Gen-AI 
assisted changes’


Amazon’s ecommerce business has summoned a large group of engineers to a 
meeting on Tuesday for a “deep dive” into a spate of outages, including 
incidents tied to the use of AI coding tools.

The online retail giant said there had been a “trend of incidents” in recent 
months, characterised by a “high blast radius” and “Gen-AI assisted changes” 
among other factors, according to a briefing note for the meeting seen by the 
FT.

Under “contributing factors” the note included “novel GenAI usage for which 
best practices and safeguards are not yet fully established”.

“Folks, as you likely know, the availability of the site and related 
infrastructure has not been good recently,” Dave Treadwell, a senior 
vice-president at the group, told employees in an email, also seen by the FT.

The note ahead of Tuesday’s meeting did not specify which particular incidents 
the group planned to discuss.

Amazon’s website and shopping app went down for nearly six hours this month in 
an incident the company said involved an erroneous “software code deployment”. 
The outage left customers unable to complete transactions or access functions 
such as checking account details and product prices.

Treadwell, a former Microsoft engineering executive, told employees that Amazon 
would focus its weekly “This Week in Stores Tech” (TWiST) meeting on a “deep 
dive into some of the issues that got us here as well as some short immediate 
term initiatives” the group hopes will limit future outages.


He asked staff to attend the meeting, which is normally optional.

Junior and mid-level engineers require more senior engineers to sign off any 
AI-assisted changes, Treadwell added in the briefing note.

Amazon said the review of website availability was “part of normal business” 
and it aims for continual improvement.

“TWiST is our regular weekly operations meeting with a specific group of retail 
technology leaders and teams where we review operational performance across our 
store,” the company said.

Separately, the company’s cloud computing arm — Amazon Web Services — has 
suffered at least two incidents linked to the use of AI coding assistants, 
which the company has been actively rolling out to its staff.

AWS suffered a 13-hour interruption to a cost calculator used by customers in 
mid-December after engineers allowed the group’s Kiro AI coding tool to make 
certain changes, and the AI tool opted to “delete and recreate the 
environment”, the FT previously reported.

Amazon previously said the incident in December was an “extremely limited 
event” affecting only a single service in parts of mainland China. Amazon added 
that the second incident did not have an impact on a “customer facing AWS 
service”.

The FT previously reported multiple Amazon engineers said their business units 
had to deal with a higher number of “Sev2s” — incidents requiring a rapid 
response to avoid product outages — each day as a result of job cuts.

Amazon has undertaken multiple rounds of lay-offs in recent years, most 
recently eliminating 16,000 corporate roles in January. The group has disputed 
the claim that headcount cuts were responsible for an increase in recent 
outages.

<https://www.ft.com/content/7cab4ec7-4712-4137-b602-119a44f771de>

Reply via email to