Whats in Amazon's buckets?
Wed 25th May 11
While catching up on some old Hak5 episodes I found the piece on Amazon's S3 storage. If you don't know what S3 is then I recommend going and watching the episode, it gives a good introduction and was all I'd had before starting this project. The thing that caught my eye, and Darren's, was when Jason mentioned that each bucket has to have a unique name across the whole of the S3 system, as soon as I heard that I was thinking lets bruteforce some bucket names.
So I signed up for the free tier and started investigating. I created a couple of buckets and looked at the options, by default a bucket is private and only accessible by the owner but you can add new permissions which make the bucket publicly accessible. I made one bucket private, one public then hit their URLs to see what would happen, this is what I got back:
Private bucket
<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>7F3987394757439B</RequestId>
<HostId>kyMIhkpoWafjruFFairkfim383jtznAnwiyKSTxv7+/CIHqMBcqrXV2gr+EuALUp</HostId>
</Error>
Public bucket
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Name>digipublic</Name>
<Prefix></Prefix>
<Marker></Marker>
<MaxKeys>1000</MaxKeys>
<IsTruncated>false</IsTruncated>
</ListBucketResult>
There is an obvious difference between the two so that will be easy to test for in a script. The next thing I looked at was the region. When you set a bucket up you can specify which of the five data centres the data is stored in so your data is closer to your target audience. You get the following options:
- US Standard
- Ireland
- Northern California
- Singapore
- Tokyo
So I setup a bucket in each and accessed them all, the difference when accessing them is the hostname, this is the mapping:
- US Standard = http://s3.amazonaws.com
- Ireland = http://s3-eu-west-1.amazonaws.com
- Northern California = http://s3-us-west-1.amazonaws.com
- Singapore = http://s3-ap-southeast-1.amazonaws.com
- Tokyo = http://s3-ap-northeast-1.amazonaws.com
But as the bucket names have to be unique across the whole of S3 what happens if you access a bucket in Tokyo with the hostname for Ireland?
<Error>
<Code>PermanentRedirect</Code>
<Message>
The bucket you are attempting to access must be addressed using the
specified endpoint. Please send all future requests to this endpoint.
</Message>
<RequestId>4834475949AFC737</RequestId>
<Bucket>digitokyo</Bucket>
<HostId>TC1DCxcxiejfiek33492034AqtEVBxr+1Oj0GJvmCktGVrlcdZz9YjX5wHMbITi2</HostId>
<Endpoint>digitokyo.s3-ap-northeast-1.amazonaws.com</Endpoint>
</Error>
They kindly redirect you to the correct hostname.
With all this info I built up a script which would take a word list and run through it trying to access a bucket for each word, it nicely parsed out the returned XML, followed redirections and resulted in a list showing public, private and unassigned buckets.
That was good, but what about files? I put some files in my public bucket and hit its URL:
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Name>digipublic</Name>
<Prefix></Prefix>
<Marker></Marker>
<MaxKeys>1000</MaxKeys>
<IsTruncated>false</IsTruncated>
<Contents>
<Key>my_file</Key>
<LastModified>2011-05-16T10:47:16.000Z</LastModified>
<ETag>"51fff3c9087648822c0a21212907934a"</ETag>
<Size>6429</Size>
<StorageClass>STANDARD</StorageClass>
</Contents>
</ListBucketResult>
That is a directory listing, that is good!
I put some more files in, some private and some public and they all showed up in the list. Trying to access private files though resulted in a "403 Forbidden" being returned and a bunch of XML similar to that for a private bucket. However I can use this, by doing a HEAD on each file in the directory list I get either a "200 OK" or a "403 Forbidden", this means that I can now enumerate all the files to see if they are public or private.
Quick summary... Given a word list I can check which buckets exist and if they do whether they are public or private. For all public ones I can get a directory listing and from that listing I can see which files are public and which are private. I think that is pretty good for a mornings work.
I called the script Bucket Finder and you can download it from its project page.
I've ran the script a few times with some nice long word lists and got some interesting data back but as this post is getting a bit long I'll stop here and you can read the analysis in Analysing Amazon's Buckets.