Mahout/Cassandra integration

Jeremy Hanna Thu, 04 Nov 2010 12:53:40 -0700

For people interested in using Cassandra with Mahout, there are a few possible 
integration points that could be fleshed out.  I was talking with Grant 
Ingersoll about this at apachecon and thought I would send out a note about it. 
 The motivation could be enhancing Cassandra's analytics capabilities with 
using Mahout with data stored in Cassandra.


drivers - in the bin directory there is a script that loads drivers.  Those 
drivers are used to input to the algorithms from sequence files through the 
hdfs inputformat by default.  It could possibly use Cassandra's inputformat or 
have a pluggable option.  I'm not sure where the output comes into play, but I 
would think that it would likewise just be able to use the outputformat.

datamodel - 
https://hudson.apache.org/hudson/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/model/DataModel.html

DataStore - 
https://hudson.apache.org/hudson/job/Mahout-Quality/javadoc/org/apache/mahout/classifier/bayes/interfaces/Datastore.html
Currently there is an HBase and an in memory data store, but that would be a 
relatively simple integration point.

Other integration points in the future might be using Flume for output and 
could also go through flume to Cassandra through the Cassandra sink that Tyler 
Hobbs did - https://github.com/thobbs/flume-cassandra-plugin

Anyway, just wanted to relay that info.

Mahout/Cassandra integration

Reply via email to