Scripting Hadoop Jobs in Groovy

| | Comments (0) | TrackBacks (0)

After much poking around and experimentation, we just packaged and released 0.1.0 of Cascading.groovy, our Groovy language interpreter extension. Read more on the Cascading site.

We think this will be a great tool for those groups that need to expose Hadoop to the 'casual' user who needs to get and manipulate valuable data on a Hadoop cluster, but doesn't have the time to learn Java, the Hadoop API, or to think in MapReduce to solve problems that are a notch or more above trivial.

It is worthy of mentioning here, in spite being mentioned on the Cascading site, that no Groovy scripting is run in the cluster (on the slave nodes).

Groovy is only being used as a configuration language to allow for the assembly of complex workflows to be run on Hadoop.

The best metaphor is Ant and it's build files. You would use Ant to create your build process when developing Java code. Here we use Groovy to develop Data Set build files when executing on Hadoop.

Be warned, Cascading.groovy is still likely rough around the edges, but we think it's a great start worthy of a deeper look by any current or would be Hadoop user.

0 TrackBacks

Listed below are links to blogs that reference this entry: Scripting Hadoop Jobs in Groovy.

TrackBack URL for this entry: http://www.manamplified.org/cgi-bin/mt-tb.cgi/389

Leave a comment