Thought I would quickly post this link to the Hadoop wiki comparing GridGain to Hadoop. In summary, Hadoop was designed for large data applications. GridGain is simply a re-imagining of tuple-spaces with constraints on available JVM memory (as implied by the comparison). Hopefully I'll post my own opinions at a later date. [Update] A reaction to the comparison has been posted.
Recently in Tools Category
There is much buzz regarding the recently announced feature additions to EC2. Namely about Elastic IP Addresses and Availability Zones. But if you look closely, you will see there were 4 new features added. User Selectable Kernels and New Public AMIs and Kernels (32bit and 64bit).
Thought I would share a few helpful hints to keep in mind when using EC2 and S3. Nothing mind blowing here, just some things worthy of note to the beginner. All of them born of fire managing Cascading / Hadoop clusters.
I'm not a fan of the name 'Cloud Computing'. As a metaphor is dissipates rather quickly. Nevertheless, IBM has recently announced their new cloud initiative, Blue Cloud. And Sun will announce the private beta of theirs tomorrow (Feb 21), Project Caroline. Competition with Amazon is welcome. But more welcome is direct support for Hadoop in both of these infrastructures. And by virtue, more reasons to use Cascading.
Looks like a new Lucene sub-project has just been announced named Mahout.
As a concrete extension to my thoughts on Wide Virtualization, I've started a new project called Cascading. Simply, it is a pipe and filter abstraction over map/reduce as implemented by Hadoop.
Seems the guys at IBM have been busy with Hadoop. They just announced Jaql, "a new query language for JSON data". This should turn out to be quite handy as I have already found the utility of storing Hadoop data in a JSON format.
Have started using JgraphT on my latest project and am excited to report that I really like it. It allows me to build directed (and undirected) graphs of my own class instances without changing my code. Plus it includes algorithms for shortest path and transitive closures, as well as breadth and depth first iterators. And the gravy on top is its support for writing DOT files which can be imported into OmniGraffle. Brilliant...
As mentioned briefly in a previous post, Pig is now available through the Apache incubator.
Looks like Intellij IDEA will be supporting the JavaScript E4X syntax in a future release. See the JavaScript E4X syntax support feature request. Great news. Now let's get it shipped in a near future JDK release.