Does anyone have any thoughts on this? I'm completely out of idea's on this.
On Thu, May 30, 2013 at 3:12 PM, Pradeep Gollakota <[email protected]>wrote: > Hey guys, > > I have a custom Storage function that loads from the Accumulo database > (similar to HBase). > I have the following script that I'm trying to execute: > > A = load 'accumulo://table_a' > using org.apache.accumulo.pig.AccumuloStorage('cf:cq1 cf:cq2', > '-loadKey') > as (id: chararray, a: chararray, b: chararray); > B = load 'accumulo://table_b' > using org.apache.accumulo.pig.AccumuloStorage('cf:cq1 cf:cq2', > '-loadKey') > as (id: chararray, a: chararray, b: chararray); > C = join A by a, B by b; > dump C; > > When I execute this dataset A is not getting loaded. > If I do the following: > C = join B by b, A by a; > A is loaded, but B is not. > > The current work around I have for this is to store A and B into temporary > storage using PigStorage() and load them again to do my join. However, > that's extra read/write phases that I'd like to avoid. In my implementation > of the AccumuloStorage() function, I set pig.noSplitCombination to true. > > I'm not sure what the problem with my LoadFunc is and why it's not loading > both datasets correctly. > > Any help would be appreciated. > > Thanks > Pradeep >
