What is it?
If you are unfamiliar with Amazon Web Services (AWS) or their Elastic Load Balancer (ELB) service, ELB is a load-balancing service that you can use to spread incoming traffic across many different EC2 server instances.
ELB, like all things in the AWS cloud, is a dynamic service that scales up and down on its own (managed by AWS internally) based on the number of inbound requests you have.
If your site has a trickle of traffic, the ELB AWS deploys for you in the background is tiny and routes a small amount of traffic. If you suddenly get on the front page of Slashdot, Hacker News and Reddit, as the traffic comes SCREAMING in, AWS will see the increase and swap out your tiny Elastic Load Balancer for a much bigger one (or a group of them).
This transition process when load balancers are swapped in/out can lead to small windows of down-time when clients cannot see your site, but if your traffic gradually scales up the AWS deployment of ELBs is relatively smooth with little to no down time (it just doesn’t react great to huge, instant spikes).
How does it happen?
As you imagine, when ELB instances are swapped in/out, the IP address of the actual load balancing servers that clients are connecting to is changing.
Amazon works around this being a glaring problem by setting very low TTLs (time-to-live) on the domain name mappings for those machines, but clients that are caching the IP address and not honoring the TTL may still be trying to connect to the old ELB after it has been replaced by another one that your app was using.
For example, this customer ended up with Netflix’s API traffic from 1 (or more) clients for 4 days, accounting for 30% of their systems daily traffic. That is all bandwidth and server capacity that customer has to pay for that doesn’t belong to them.
Considering that the old ELB may have been assigned to a new customer, this is where you end up with the frequent forum questions: “Why am I getting all this traffic for a different site?“.
Why is this Bad?
We have written in the past that this design can actually lead to unintended security problems in the AWS cloud.
While this issue doesn’t guarantee your site getting hacked or your customers being compromised, if you are not careful with your design it is very possible you could be leaking information about your users to unintended listeners forced into your system by the AWS ELB system.
What are the Security implications?
When you deploy on Amazon Web Services with Elastic Load Balancers, you need to assume that random (untrusted) sources are reading client requests to your server (inbound traffic).
If you assume this and design around that security hole, you should be safe.
For example, don’t allow customers to send passwords or credentials in plain-text; someone listening to this traffic could log all these requests and then have your user’s credentials.
Don’t allow for sending session ID’s in plain-text, that can lead to session hijacking or side-jacking.
If you are accepting plain-text passwords or session IDs, provide some level of client/server public/private-key verification (e.g. HMAC)
If you distill this problem down to a well-known attack vector, this is essentially AWS inserting a man-in-the-middle into your environment. If the mitm happens to be nefarious, then you are opening yourself up to a man-in-the-middle-attack.
How do I secure my web app?
Design your app to be resilient in the face of a man-in-the-middle-attacker. Use the same security tips and tricks when addressing those issues.
For example, if you are deploying a RESTful web service, consider securing it using 2-legged OAuth or an HMAC or for a standard web application, only allow logins over HTTPS.
Update #1: Eran over at Forecast: Cloudy had a great suggestion on how to address this as well: configure your services to only respond to requests targeted at your hostname.
Eran has also gone a step farther and made a feature-request of the AWS team to add “domain protection” to ELBs natively.
If you are new to AWS, this isn’t the end of the world. As long as you are aware of this behavior and can protect against it, your app will be stronger for it.
Much like designing for the cloud forced us to learn about high-availability, load balancing and regional replication to make our apps more resilient, something like this forces us to think about the security of our applications.
Which is a good thing.
Update #2: Spencer, an AWS employee, posted recently that the EC2 team has a few enhancements to AWS in the works that will specifically address this issue moving forward.