In a previous post, I discussed the problems I have run into with my thin client setup. One of the main issues has been moving data around from different machines, especially if I want to do some work locally. One partial solution that I have discovered, which could make life a little easier is boto. Boto is a python library that provides an interface to Amazon Web Services. I have found the S3 API to be very helpful in reducing some of the pain points in the thin client setup. Instead of having to manually push data around, I can setup boto within my python scripts, and they will automatically retrieve data from S3, as well as save the results back to S3. This saves a lot of time when running temporary EC2 instances, as I no longer have to push and pull data manually, but can just load up the python script (or more preferably, clone a github repo, more on that later) and run the code.
I wrote some python code that wraps some of the more common boto methods that I have been using. I have posted the code on gist, so I won’t go over it in too much detail here. At the moment, you can push and pull data from S3 using the push and pull methods, and I will soon be adding a few more methods, such as retrieving the file list of buckets. One issue that I ran into was with establishing a connection to S3. Apparently Amazon ‘prefers’ a certain naming convention for S3 buckets to ensure that they are DNS and SSL compliant, however they don’t advertise this very well. The bucket that I was trying to connect to was called ‘belkin-data’, and from what I gather, the ‘-‘ was outside of the naming convention. The boto connection method, with default settings, will fail if the naming conventions are not met. This is pretty ridiculous, as it wasn’t obvious from the error message what was going on, and this was a pretty obscure error to look up on the web. Anyways, the way to fix this was to modify the calling format to allow for connections to buckets with non-conventional names. I have posted the snippet of code below, with some comments, in case anyone else runs into this error.
import boto import boto.s3.connection import credentials as cr """ I save my credentials in a python file called credentials.py and import them. This allows me to post the code publicly, without having to make any modification (obviously I don't post the credential.py file) """ # Connect to S3 conn = boto.connect_s3( aws_access_key_id = cr.access_key, aws_secret_access_key = cr.secret_key, # If outside the US Classic Region, you need to define host or location arguments host = 's3-us-west-2.amazonaws.com', # The calling format needs to be changed to OrdinaryCallingFormat() # if your bucket name does not comply with amazon conventions calling_format = boto.s3.connection.OrdinaryCallingFormat(), ) # Connect to bucket conn.get_bucket('bucket-name')
Once the connection was established, I didn’t run into any major issues working with boto. As I mentioned earlier, this has made setting up quick EC2 instances much less painful. I was originally using bash scripts to push data around, which did the trick, but it is a lot nicer just to be able to do this inside of python. In the next few days I will look into using boto to launch EC2 instances, as that will hopefully solve another pain point (sitting around waiting for EC2 to initialize is a good excuse for a coffee break, but can kill productivity).