Thought I would share a few helpful hints to keep in mind when using EC2 and S3. Nothing mind blowing here, just some things worthy of note to the beginner. All of them born of fire managing Cascading / Hadoop clusters.
S3 connections fail inside EC2.
This is obvious, but worth pointing out. Even inside the EC2 network, your S3 connections can fail. If coding in Java, check out JetS3t. Also make use of MD5 checksums, and set them on the S3 header, no need to keep a '.md5' side file.
Chunk your S3 data.
You will get full bandwidth from S3 to an instance if you can open up multiple connections. There is a good description over on the RightScale blog.
Leverage S3 TCP optimizations
Just announced by Amazon is support for TCP Window Scaling and Selective Acknowledgement. It seems both tcp_sack and tcp_window_scaling are enabled by default in modern Linux kernels. But autotuning is a feature of the 2.6.17 kernel, EC2 is using 2.6.16. Here is a good tcp tuning reference.
Interestingly enough cat /proc/sys/net/ipv4/tcp_moderate_rcvbuf shows it is enabled on my instances.
Some useful SSH options.
You would be surprise how likely it is you will get the same public name as a previous session. It's also in Amazons interest to kill connections that look idle. These are useful.
SSH_OPTS=`-i PRIVATE_KEY -o StrictHostKeyChecking=no -o ServerAliveInterval=30`
SSH tunnel into your EC2 cluster.
Since there isn't any correlation between public and private names of EC2 instances, its just easier to use the private names. If need to access your machines via some client use a SOCKS proxy.
ssh $SSH_OPTS -D 6666 -N "root@$PUBLIC_NAME"
Use FoxyProxy for accessing web servers in your cluster.
After installed, use these rules. This allows you to keep all ports private except SSH (port 22). This works with the hint above. Just set your SOCKS server to 'localhost:6666'.
*compute-1.amazonaws.com* *.ec2.internal* *.compute-1.internal*
EC2 instances don't boot immediately.
It can take a few minutes before an EC2 AMI will boot. Even though the AWS API says the instance is 'running', it may not be. You are also not guaranteed it will ever boot. This little snippet of BASH is quite useful.
while true; do REPLY=`ssh $SSH_OPTS "root@HOSTNAME" 'echo "hello"'` if [ ! -z $REPLY ]; then break; fi sleep 5 done
Build your own pristine images for EC2.
RightScale has published a few scripts for building fresh images from Fedora or Centos. They can be found on this blog post. After trying both Centos 5 and Fedora 6, I've found Fedora the most useful as there are more packages available. Having a short attention span, I reverted to Fedora in order to get a reliable Ganglia installed.
After building a clean Fedora/Centos image, use it as a base for derivative images. Don't rely on public AMI's you don't manage. Since Redhat started offering paid AMI's, the Amazon built Centos image has disappeared.
Update 3-28-08: I kinda take this back. Amazon just released a few new base images using their new kernels. They scale much better, and are very lightweight. See my post on the new EC2 features.
Install Ganglia if managing a cluster
Ganglia is a cluster monitoring tool. Though the documentation is sparse, it is easy to install. Just keep in mind you cannot use multicast inside the EC2 network. So all slaves will need to open channels directly to their master. I'll let you decode my bash scripts.
To install, on Fedora:
yum -y install ganglia-gmetad ganglia-gmond ganglia-web httpd php
On your master:
sed -i -e "s|\( *mcast_join *=.*\)|#\1|" \
-e "s|\( *bind *=.*\)|#\1|" \
-e "s|\( *mute *=.*\)| mute = yes|" \
-e "s|\( *location *=.*\)| location = \"master-node\"|" \
/etc/gmond.conf
mkdir -p /mnt/ganglia/rrds
chown -R ganglia:ganglia /mnt/ganglia/rrds
rm -rf /var/lib/ganglia; cd /var/lib; ln -s /mnt/ganglia ganglia; cd
service gmond start
service gmetad start
apachectl start
On your slaves:
sed -i -e "s|\( *mcast_join *=.*\)|#\1|" \
-e "s|\( *bind *=.*\)|#\1|" \
-e "s|\(udp_send_channel {\)|\1\n host=$MASTER_HOST|" \
/etc/gmond.conf
service gmond start
Members of the same EC2 group cannot see each other by default.
EC2 instances in the same group cannot connect with each other until their group is 'authorized' to connect with itself. You must do this before you boot the instances.
ec2-authorize group_name -o group_name -u AWS_ACCOUNT_ID
Try not to use underscores in bucket names.
Bucket names can be used in the S3 domain name. Also, a useful URL format in application code for S3 is s3://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@BUCKET_NAME/KEY. This makes the bucket name part of the authority in the URL, or specifically, the host part. And domain names may not use underscores.
I'll add more as they occur to me.
Leave a comment