Hey Dave, This won't work out of the box, but it should be relatively easy to fix. Implementing a TextColumnFamilyInputFormat that wraps ColumnFamilyInputFormat to convert the datastructures it outputs to JSON/TSV/CSV.
If you have time to work on this, there is an open ticket: https://issues.apache.org/jira/browse/CASSANDRA-1193 Thanks, Stu -----Original Message----- From: "Dave Gardner" <dave.gard...@imagini.net> Sent: Wednesday, June 16, 2010 10:55am To: user@cassandra.apache.org Subject: Cassandra / Hadoop Hi all, Is it possible to use the Cassandra ColumnFamilyInputFormat in combination with the Hadoop "streaming" job? Within the Hadoop docs it says that you can specify other plugins, eg: -inputformat JavaClassName http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifying+Other+Plugins+for+Jobs However it then says: "The class you supply for the input format should return key/value pairs of Text class." Whereas the Cassandra Wiki says: "Cassandra rows or row fragments (that is, pairs of key + SortedMap of columns) are input to Map tasks for processing by your job" http://wiki.apache.org/cassandra/HadoopSupport So I'm wondering if this would work or if it's just never going to happen. I guess the alternative is to write a Hadoop Java class for the job, but this is what I'm trying to avoid. Has anyone got any examples of getting M/R working with Cassandra as input source? Thanks Dave