Cassandra / Hadoop

Dave Gardner Wed, 16 Jun 2010 08:56:16 -0700

Hi all,

Is it possible to use the Cassandra ColumnFamilyInputFormat in combination
with the Hadoop "streaming" job?  Within the Hadoop docs it says that you
can specify other plugins, eg:


-inputformat JavaClassName

http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifying+Other+Plugins+for+Jobs

However it then says:

"The class you supply for the input format should return key/value pairs of
Text class."

Whereas the Cassandra Wiki says:

"Cassandra rows or row fragments (that is, pairs of key + SortedMap of
columns) are input to Map tasks for processing by your job"
http://wiki.apache.org/cassandra/HadoopSupport

So I'm wondering if this would work or if it's just never going to happen. I
guess the alternative is to write a Hadoop Java class for the job, but this
is what I'm trying to avoid.

Has anyone got any examples of getting M/R working with Cassandra as input
source?

Thanks

Dave

Cassandra / Hadoop

Reply via email to