Amazon Web Services outage highlights complex data management

Major companies using Amazon.com's data services got a painful lesson this week about how the complexity and market dominance of the company's cloud unit makes it hard to back up their data with other providers, analysts and experts told Reuters.

The prolonged outage caused by an impairment of several network devices in its Amazon Web Services AWS Virginia data center region. The outage temporarily stopped streaming platforms Netflix Inc and Disney trading app Robinhood Markets Inc and Amazon's own e-commerce site, which makes heavy use of AWS.

An Amazon spokesman told Reuters on Wednesday that the issues had been resolved.

AWS calls the US-EAST-1 region a network problem, highlighting how difficult it is for companies to spread their cloud computing around the world, underscored the huge trail of damage caused by a network problem.

According to research firm IDC, Amazon is the world's biggest cloud computing firm, with 24.1% of the overall market. Rivals like Microsoft Corp, Alphabet's Google Inc. and Oracle Corp. are trying to lure customers to use parts of their clouds, often as backups.

Naveen Chhabra, a senior analyst at research firm Forrester, said that the process of creating a complex online service that can be shifted from one provider to another is far from simple. Instead of being a singular cloud, AWS is composed of hundreds of different services, from basic building bocks like computing power and storage to advanced services like high-speed databasees and artificial intelligence training.

The site, Chhabra said, might use several dozen of those individual services, each of which must work for the site to function. It is hard to make a backup on another cloud provider because some services are proprietary to AWS and some work very differently at another provider.

It's like saying, Can I put an SUV body on a sedan chassis? Maybe if everything is the same and lines up. There is no guarantee, Chhabra said.

AWS makes it relatively cheap to send data into its cloud but then charges higher prices for egress fees to get data out of its cloud to a rival. This makes it hard for businesses to diversify.

Matthew Prince, chief executive of Cloudflare Inc., said the outage amplifies issues like this one, where egress fees are eliminated and customers can be multi-cloud. I think that would increase the faith of customers in the cloud. Angelique Medina, head of product market at Cisco Systems Inc's ThousandEyes said that AWS has critical dependencies within its own services where they are linked in ways that can cause one to fail when another fails. AWS has a lot of complex services that are built on top of its own basic services. A basic function like networking can cascade through services that depend on it.

AWS said the outage was affecting some of our monitoring and incident response tooling, which is delaying our ability to provide updates. Medina said that AWS seems to have critical services clustered in its US-EAST-1 region, where another outage last year had a widely felt impact.

Medina said that's where a lot of their critical dependencies have been located historically. They have diversified a bit over time. The Forrester analyst, Chhabra, said Amazon has done a lot of heavy lifting to make its services resilient. Amazon does not do for its customers, which is to build applications in a way that can withstand an outage by tapping multiple locations or providers.

It can be a bit expensive to do because cloud outages are relatively rare, and it can often involve extra work that might not always be worth it.

Charly Fei, product lead for Interchain Communication at The Interchain Foundation, said it was a tradeoff that you always have between something that's secure and something that's useable. It's not something where you'll get a perfect solution that gets all three.