The AWS Data Ecosystem Grand Tour - Where Your AWS Data Lives
Written by Alex Rasmussen on December 3, 2019
This article is part of a series. Here are the rest of the articles in that series:
- Introduction
- Where Your AWS Data Lives
- Block Storage
- Object Storage
- Relational Databases
- Data Warehouses
- Data Lakes
- Key/Value and "NoSQL" Stores
- Graph Databases
- Time-Series Databases
- Ledger Databases
- SQL on S3 and Federated Queries
- Search
- Streaming Data
- File Systems
- Data Ingestion
- ETL
- Processing
- Data Interfaces
- Training Data for Machine Learning
- Data Security
- Business Intelligence
In this article, we'll talk about the global network of data centers that run AWS's services, and how the design of this network influences how services look, how they're used, and how much they cost to run.
The Geography of AWS
AWS operates a set of regions spread all over the world. Within each AWS region are multiple availability zones (AZs for short). These AZs each consist of one or more data centers and hundreds of thousands of servers. AZs within a region are geographically close to each other (less than 100 miles apart) and connected with high bandwidth, low latency links. Each AZ's power, cooling and Internet connectivity are isolated from the others, so that a power failure or connectivity disruption in one AZ shouldn't impact any of the others.
Having geographically distributed regions lets AWS's tenants operate services geographically close to their customers, which lets those services respond to client requests with (typically) lower latency. In some cases, geographic proximity to customers is also required for legal reasons1. Finally, and perhaps most importantly, being able to replicate services across geographically distant regions allows a service to continue operating even if an entire region is temporarily unavailable or destroyed due to some kind of large-scale disaster. Replicating services across different AZs within a single region provides a similar (but reduced) degree of disaster protection, with the trade-off that communication between AZs is faster than it is between regions, since the AZs are closer to one another and have really good mutual connectivity.
In addition to those geographically distributed regions, AWS operates a content distribution network (or CDN) called Amazon CloudFront with hundreds of globally distributed edge locations that allow its tenants to cache data even closer to their users if needed2. At time of writing, they've also just announced AWS Outposts (an AWS-managed rack in your company's data center providing API-compatible access to some AWS services), AWS Local Zones (which is likely a bunch of AWS Outposts in an AWS-managed co-location facility within a metro area), and AWS Wavelength (which is likely a bunch of AWS Outposts inside of various telecommunications companies), all of which are designed in one way or another to get AWS services closer to where a tenant's users are.
What This Means for Transfer Cost
The physical geography of the AWS network has direct implications on how AWS's tenants are billed for data transfer within that network. Typically, data transfer in AWS becomes more expensive the more constrained bandwidth becomes between the transfer's source and destination. In some cases, the cost of transferring data can meet or exceed the cost of storing it! It's important to understand why different kinds of transfer are billed this way, since it's consequential for most of the services we'll be covering in later articles.
Data transfer between servers within the same AZ is free3, since servers within an AZ are connected to one another by a network with a lot of bandwidth capacity. AZs within a region have high-bandwidth connectivity, but it's nowhere close to the bandwidth within a single AZ, so data transfer between AZs costs money. The connections between regions are much more bandwidth-constrained than those between a given region's AZs, so data transfer between regions costs a lot more than transfer within the same region.4
Transfer into AWS from the Internet is free. After all, data coming into AWS's network means that it's being stored, served and processed by AWS services, which means more money for AWS. Transfer from AWS to the Internet costs money, however, and it costs more money if the source region is in a part of the world where Internet bandwidth is harder to come by. For instance, outbound data transfer from the Sao Paulo region is a lot more expensive than outbound transfer from the Oregon region.
It's not entirely clear to me at time of writing how transfer pricing for AWS Outposts, AWS Wavelength, and AWS Local Zones works. It's possible that AWS will treat Local Zones and Wavelength as either regions or AZs and price data transfer for them accordingly, and that it will treat traffic to AWS Outposts the same as traffic to and from the Internet as far as pricing is concerned. That's just a guess, though, and right now not all of those services are available unless a you're big enough tenant to get invited.
Finally, transfer from AWS to CloudFront is free. Moving traffic away from data centers and out to edge locations sheds load from the data centers and cuts transfer costs, so AWS incentivizes its users to shift traffic toward the edge by making that shift free.
To summarize: if you're going to replicate data across availability zones or regions for increased durability or availability, it's going to cost money. Transfer within an AZ is cheaper than transfer between AZs in a region, which is in turn cheaper than transfer between regions. If you're just looking to get data to your customers faster, consider saving money by using CloudFront or a similar CDN instead of using inter-region replication to accomplish the same thing.
Onward and Service-ward
In this article, we took a look at the global data center network that runs AWS services. In the next article, we'll look at AWS services for block storage and object storage, including two of the oldest and most widely used AWS data services, Amazon Simple Storage Service (S3) and AWS Elastic Block Store (EBS).
If you'd like to get notified when new articles in this series get written, please subscribe to the newsletter by entering your e-mail address in the form below. You can also subscribe to the blog's RSS feed. If you have any questions, comments, or corrections relating to any article this series, please contact me.
-
For instance, under GDPR (the European Union's data protection laws), information collected about citizens of the EU must either physically reside within the EU or within a jurisdiction with similar data protections. These kinds of laws are often called "data sovereignty" laws. ↩
-
AWS is also continuing to invest in pushing compute out to those edge locations as well, but since we're focusing on data we won't cover that part of it here. ↩
-
"Free" here actually means free if you're not transferring out from an elastic IP address, or across a VPC peering connection, or via AWS PrivateLink. AWS pricing is complicated. ↩
-
This is, of course, largely speculative. Amazon and other public cloud providers are notoriously tight-lipped about how their networks are structured. It's likely that at least part of this cost goes to paying for network peering and transit arrangements between AWS and various ISPs, although it's also likely that some of this long-distance traffic is traversing a network that Amazon owns. ↩