I haven't used that in particular, but it's pretty trivial to do that with Pig and I would imagine it would just do the right thing under the covers. It's a simple join with Pig. We use pygmalion to get data from the Cassandra bag. A simple example would be: DEFINE FromCassandraBag org.pygmalion.udf.FromCassandraBag();
raw_billing_acount = LOAD 'cassandra://voltron/billing_account' USING org.apache.cassandra.hadoop.pig.CassandraStorage() AS (id:chararray, columns:bag {column:tuple (name, value)}); billing_account = FOREACH raw_billing_account GENERATE id, FLATTEN(FromCassandraBag('name, age, address, city, state, zip',columns)) AS ( name: chararray, age: chararray, address: chararray, city: chararray, state: chararray, zip: chararay ); raw_game_account = LOAD 'cassandra://voltron/game_account' USING org.apache.cassandra.hadoop.pig.CassandraStorage() AS (id:chararray, columns:bag {column:tuple (name, value)}); game_account = FOREACH raw_game_account GENERATE id, FLATTEN(FromCassandraBag('username, level, experience_points, super_powers, vehicles',columns)) AS ( username: chararray, level: chararray, experience_points: chararray, super_powers: chararray, vehicles: chararray ); composite_relation = FOREACH (join billing_account by id, game_account by id) GENERATE billing_account::id as id, name, username, level, super_powers; Anyway - not sure if that's what you're looking for but that's what we do a lot of with Pig - joins on any attribute or group bys or things like that. On Mar 1, 2012, at 4:45 AM, Benoit Mathieu wrote: > Hi all, > > I want to write a MapReduce job with a Map task taking its data from 2 > CFs. Those 2 CFs have the same row keys and are in same keyspace, so > they are partionned the same way across my cluster and it would be > nice that the Map task reads the both column families locally. > > In hadoop package org.apache.hadoop.mapred.join, there is a > CompositeInputFormat class, which seems to do what I want, but it > seems related to HDFS files as the "compose" method takes "Path" args. > > Does anyone have ever wrote a CompositeColumnFamilyInputFormat ? or > have any insight about it ? > > Cheers, > > Benoit