+1 - We do a lot of this with Pig - joining over several column families. Pig makes it just work. I think Hive does something similar. Unless you really need that much control over your process, I would really use one of those two.
On Jul 15, 2011, at 5:28 PM, Jonathan Ellis wrote: > The easy answer is "use something like Pig or Hive that does these > joins for you under the hood." > > Not actually sure what the hard answer is. :) > > On Fri, Jul 15, 2011 at 1:34 AM, Markus Mock <markus.m...@gmail.com> wrote: >> Hello, >> with org.apache.cassandra.hadoop.ConfigHelper.setInputColumnFamily I can set >> up the map phase to read from one column family. Is it possible to have >> multiple mapper classes each mapping over their own column family so that >> data from multiple column families can be "joined" in the reduce phase? I >> didn't find any documentation on how to do that. >> One workaround I see is to do several MRs write the data from the different >> column families in a single helper column family and then do the desired >> computation but I am trying to avoid that if possible. Any suggestions on >> how to do this without running multiple MRs and instead read from multiple >> column families in one go? >> Thanks. >> -- Markus >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com