Hi all,

Is it possible to use the Cassandra ColumnFamilyInputFormat in combination
with the Hadoop "streaming" job?  Within the Hadoop docs it says that you
can specify other plugins, eg:

-inputformat JavaClassName

http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifying+Other+Plugins+for+Jobs

However it then says:

"The class you supply for the input format should return key/value pairs of
Text class."

Whereas the Cassandra Wiki says:

"Cassandra rows or row fragments (that is, pairs of key + SortedMap of
columns) are input to Map tasks for processing by your job"
http://wiki.apache.org/cassandra/HadoopSupport

So I'm wondering if this would work or if it's just never going to happen. I
guess the alternative is to write a Hadoop Java class for the job, but this
is what I'm trying to avoid.

Has anyone got any examples of getting M/R working with Cassandra as input
source?

Thanks

Dave

Reply via email to