Re: Multiple input column families in Cassandra Hadoop mapreduce

Jeremy Hanna Fri, 15 Jul 2011 15:35:49 -0700

+1 - We do a lot of this with Pig - joining over several column families.  Pig 
makes it just work.  I think Hive does something similar.  Unless you really 
need that much control over your process, I would really use one of those two.


On Jul 15, 2011, at 5:28 PM, Jonathan Ellis wrote:

> The easy answer is "use something like Pig or Hive that does these
> joins for you under the hood."
> 
> Not actually sure what the hard answer is. :)
> 
> On Fri, Jul 15, 2011 at 1:34 AM, Markus Mock <markus.m...@gmail.com> wrote:
>> Hello,
>> with org.apache.cassandra.hadoop.ConfigHelper.setInputColumnFamily I can set
>> up the map phase to read from one column family. Is it possible to have
>> multiple mapper classes each mapping over their own column family so that
>> data from multiple column families can be "joined" in the reduce phase? I
>> didn't find any documentation on how to do that.
>> One workaround I see is to do several MRs write the data from the different
>> column families in a single helper column family and then do the desired
>> computation but I am trying to avoid that if possible. Any suggestions on
>> how to do this without running multiple MRs and instead read from multiple
>> column families in one go?
>> Thanks.
>>   -- Markus
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com

Re: Multiple input column families in Cassandra Hadoop mapreduce

Reply via email to