Whats in Amazon's buckets?

Wed 25th May 11

While catching up on some old Hak5 episodes I found the piece on Amazon's S3 storage. If you don't know what S3 is then I recommend going and watching the episode, it gives a good introduction and was all I'd had before starting this project. The thing that caught my eye, and Darren's, was when Jason mentioned that each bucket has to have a unique name across the whole of the S3 system, as soon as I heard that I was thinking lets bruteforce some bucket names.

So I signed up for the free tier and started investigating. I created a couple of buckets and looked at the options, by default a bucket is private and only accessible by the owner but you can add new permissions which make the bucket publicly accessible. I made one bucket private, one public then hit their URLs to see what would happen, this is what I got back:

Private bucket

	<Message>Access Denied</Message>

Public bucket

<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">

There is an obvious difference between the two so that will be easy to test for in a script. The next thing I looked at was the region. When you set a bucket up you can specify which of the five data centres the data is stored in so your data is closer to your target audience. You get the following options:

  • US Standard
  • Ireland
  • Northern California
  • Singapore
  • Tokyo

So I setup a bucket in each and accessed them all, the difference when accessing them is the hostname, this is the mapping:

  • US Standard = http://s3.amazonaws.com
  • Ireland = http://s3-eu-west-1.amazonaws.com
  • Northern California = http://s3-us-west-1.amazonaws.com
  • Singapore = http://s3-ap-southeast-1.amazonaws.com
  • Tokyo = http://s3-ap-northeast-1.amazonaws.com

But as the bucket names have to be unique across the whole of S3 what happens if you access a bucket in Tokyo with the hostname for Ireland?

		The bucket you are attempting to access must be addressed using the
		specified endpoint. Please send all future requests to this endpoint.

They kindly redirect you to the correct hostname.

With all this info I built up a script which would take a word list and run through it trying to access a bucket for each word, it nicely parsed out the returned XML, followed redirections and resulted in a list showing public, private and unassigned buckets.

That was good, but what about files? I put some files in my public bucket and hit its URL:

<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">

That is a directory listing, that is good!

I put some more files in, some private and some public and they all showed up in the list. Trying to access private files though resulted in a "403 Forbidden" being returned and a bunch of XML similar to that for a private bucket. However I can use this, by doing a HEAD on each file in the directory list I get either a "200 OK" or a "403 Forbidden", this means that I can now enumerate all the files to see if they are public or private.

Quick summary... Given a word list I can check which buckets exist and if they do whether they are public or private. For all public ones I can get a directory listing and from that listing I can see which files are public and which are private. I think that is pretty good for a mornings work.

I called the script Bucket Finder and you can download it from its project page.

I've ran the script a few times with some nice long word lists and got some interesting data back but as this post is getting a bit long I'll stop here and you can read the analysis in Analysing Amazon's Buckets.