Recently in Technical Category

Hadoop Quotes

| | Comments (0) | TrackBacks (0)

A couple interesting quotes from the Hadoop user list. Not indicative of anything in particular, but noteworthy.

Hadoop & EC2

| | Comments (0) | TrackBacks (0)

Hadoop 0.17.0 is now generally available. This also means there are new scripts for managing EC2 clusters using the new EC2 features like 'availability zones', the new optimized kernels, 32 and 64 bit images, and Ganglia. Also looks like Tom has already packaged new public AMI's as well. You can read about the changes here on the Hadoop Wiki EC2 page. Here also is the JIRA issue with the patches.

Thought I would share a few helpful hints to keep in mind when using EC2 and S3. Nothing mind blowing here, just some things worthy of note to the beginner. All of them born of fire managing Cascading / Hadoop clusters.

Check out theinfo.org, it's "for people with large data sets".

A couple quick links worth sharing. First is an article in BusinessWeek discussing in part how Hadoop is entering the classroom, in Wisdom of Clouds. Second, Communications of the ACM has a brief perspective on MapReduce, in The Data Center Is The Computer.

Hoping to get myself a python binary that runs on my Infrant NAS device, I built out a cross compiler on an EC2 instance and created an AMI for it. Now I have a python sparc binary compiled on Linux with the Infrant patches for glibc.

Infrant + SSH

| | Comments (0) | TrackBacks (0)

Finally, Infrant has updated its firmware to include support for SSH. Sadly it wasn't the kind of support I was hoping for. You can enable root access and shell in for various tasks. But you cannot, via the web interface, initiate backups over scp or simply rsync over ssh from the Infrant server, from what I can tell.

Slicehost

| | Comments (0) | TrackBacks (0)

Finally got my slice at Slicehost and am slowly migrating from my 5 year old FreeBSD jail to a fancy new Xen instance running Centos. Would love to stay on FreeBSD, but alas, it isn't offered. Completed the DNS migration last night, let's see how those propagate.

Pig Now Incubating

| | Comments (0) | TrackBacks (0)

As mentioned briefly in a previous post, Pig is now available through the Apache incubator.

For years I've used or relied upon virtualization technologies such as User Mode Linux, FreeBSD Jail, VMWare, Parallels, and Xen (by virtue of EC2). All of these represent a kind of vertical or narrow virtualization, where you have a single box and are sliding into it as many virtual systems as possible. The converse to this is a kind of wide virtualization that spanning many machines.