Looks like a new Lucene sub-project has just been announced named Mahout.
Per the announcement,
"Mahout's goal is to create a suite of practical, scalable machine learning libraries. Our initial plan is to utilize Hadoop (http://hadoop.apache.org) to implement a variety of algorithms including naive bayes, neural networks, support vector machines and k-Means, among others."
More welcome news indeed as I expect these libraries will dovetail into Cascading quite nicely.
Leave a comment