RE: Cassandra / Hadoop

Stu Hood Wed, 16 Jun 2010 09:43:40 -0700

Hey Dave,

This won't work out of the box, but it should be relatively easy to fix. 
Implementing a TextColumnFamilyInputFormat that wraps ColumnFamilyInputFormat 
to convert the datastructures it outputs to JSON/TSV/CSV.


If you have time to work on this, there is an open ticket: 
https://issues.apache.org/jira/browse/CASSANDRA-1193

Thanks,
Stu

-----Original Message-----
From: "Dave Gardner" <dave.gard...@imagini.net>
Sent: Wednesday, June 16, 2010 10:55am
To: user@cassandra.apache.org
Subject: Cassandra / Hadoop

Hi all,

Is it possible to use the Cassandra ColumnFamilyInputFormat in combination
with the Hadoop "streaming" job?  Within the Hadoop docs it says that you
can specify other plugins, eg:

-inputformat JavaClassName

http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifying+Other+Plugins+for+Jobs

However it then says:

"The class you supply for the input format should return key/value pairs of
Text class."

Whereas the Cassandra Wiki says:

"Cassandra rows or row fragments (that is, pairs of key + SortedMap of
columns) are input to Map tasks for processing by your job"
http://wiki.apache.org/cassandra/HadoopSupport

So I'm wondering if this would work or if it's just never going to happen. I
guess the alternative is to write a Hadoop Java class for the job, but this
is what I'm trying to avoid.

Has anyone got any examples of getting M/R working with Cassandra as input
source?

Thanks

Dave

RE: Cassandra / Hadoop

Reply via email to