The AWS Data Ecosystem Grand Tour - Data Security
Written by Alex Rasmussen on March 26, 2020
This article is part of a series. Here are the rest of the articles in that series:
- Introduction
- Where Your AWS Data Lives
- Block Storage
- Object Storage
- Relational Databases
- Data Warehouses
- Data Lakes
- Key/Value and "NoSQL" Stores
- Graph Databases
- Time-Series Databases
- Ledger Databases
- SQL on S3 and Federated Queries
- Search
- Streaming Data
- File Systems
- Data Ingestion
- ETL
- Processing
- Data Interfaces
- Training Data for Machine Learning
- Data Security
- Business Intelligence
S3 makes it easy for anyone in your organization to store massive amounts of data. With that great power, however, comes great responsibility. The ease of access and virtually unlimited storage that cloud storage systems like S3 provide also make it easy for a malicious person - be that a hacker or a disgruntled employee - to walk away with a lot of sensitive data that's accidentally left unsecured. That sensitive data can be severely damaging - to your business, your customers, or both - if it falls into the wrong hands. Personally identifiable information (PII) and protected health information (PHI) are both subject to ever-increasing amounts of regulation that provide increased protections for customers, and increased penalties in the event of a breach.
The problem of securing your sensitive data in S3 is hard enough, but the truth is that many organizations don't even know what sensitive data they have or where that data resides. S3 buckets are easy to create, so it's easy to get into a situation where you have a lot of buckets of unknown provenance containing who-knows-what.
Any effective strategy for keeping your data under your control involves three important steps: knowing what data you've got, knowing which data is sensitive, and treating that sensitive data in a secure, consistent, and compliant way. You probably won't be able to do any of these three steps perfectly, especially inside a really big organization, but anything that makes these steps easier to perform is helpful.
AWS's solution for securing data in S3 is called Amazon Macie. At its core, Macie consists of a couple of classifiers. The first classifier has been trained to recognize many different kinds of sensitive data, including source code, API keys, unencrypted backups, passwords, system logs, and social security numbers. That classifier is used to scan your S3 data looking for sensitive stuff. The second classifier has been trained to identify risky or suspicious AWS API traffic in your organization's S3 logs in CloudTrail. It looks at things like API call type, call frequency, time, and location, and automatically flags high-risk behavior that might indicate that an attack is underway.
If either of Macie's classifiers detect that something is amiss, they can trigger an alert and optionally notify an external incident management system. This external system can then be used to send further alerts, trigger compliance rules, generate reports, or reconfigure the S3 buckets themselves. Macie doesn't do most of these things itself, but knowing AWS I suspect they'll have a managed incident management service at some point.
This additional security comes at a high price. You're charged several dollars for every GB of content the data sensitivity classifier classifies and every million CloudTrail events the API call classifier scans. If you want to keep classification metadata for your S3 objects for longer than 30 days, that'll cost extra. For large S3 buckets, that's an awful lot of money, but compared to millions in restitution and incalculable loss of customer goodwill, it's a drop in the bucket.
If you're looking for an alternative to Macie, my friends at Divebell are building a product for managing the entire personal data protection lifecycle. They're a great team and a product worth watching, and they didn't pay me to say that.
Next: Business Intelligence
In this article, we took a look at AWS's solution for securing access to data in S3. In the next and final(!) article in this series, we'll take a look at Amazon Quicksight, AWS's business intelligence tool.
If you'd like to get notified when new articles in this series get written, please subscribe to the newsletter by entering your e-mail address in the form below. You can also subscribe to the blog's RSS feed. If you'd like to talk more about any of the topics covered in this series, please contact me.