I think you want something like this:
X = load 'data3' as (a: chararray, b: bag{(c: chararray)});
Y = foreach X generate a, FLATTEN(b);
It looks like the schema for data3 is not right... it looks like it's
loading the entire tuple into 'a' as a chararray instead of doing what you
expect. It might have something to do with the way that 'data3' was
generated or the Storage() load function you are using.
On Wed, May 22, 2013 at 11:51 AM, Ho Duc Ha <[email protected]> wrote:
> Actually I think you're right, the process in map/reduce isn't so
> different.
>
> However, after trying to do this, we can't understand the output we see
> below. We expected to see only '3' in alias Z, and '5' and '6' in alias Y,
> neither result was as expected.
>
> X = load 'data3' as ( a:chararray, b:bag{(c:chararray)} );
> Y = foreach X { W = foreach b generate *; generate W; };
> Z = foreach X generate a;
>
> data3
> ( '3', {( '5' ),('6')} )
>
> dump X
> (( '3', {( '5' ),('6')} ),)
>
> dump Y
> ({})
>
> dump Z
> (( '3', {( '5' ),('6')} ))
>
>
>
>
> On Wed, May 22, 2013 at 8:25 PM, Pradeep Gollakota <[email protected]
> >wrote:
>
> > Hi All,
> >
> > I'm a beginner pig user and this is my first post to the Pig mailing
> list.
> >
> > Anyway, to answer your question, the first thing that comes to my mind is
> > that Pig may not be able to do a complex join like that.
> >
> > However, you can first flatten the bag in A, then do your join and then
> do
> > a group by do get the result in the format you are looking for. This may
> > not be an idea solution, but it should work.
> >
> > Pradeep
> >
> >
> > On Wed, May 22, 2013 at 8:49 AM, Ho Duc Ha <[email protected]> wrote:
> >
> > > We've got a data type that is modeled after a typical object-oriented
> > > data-model format (simple fields, and collections of other objects).
> > We're
> > > trying to accomplish the following join:
> > >
> > > Here's out example input:
> > > -------------------------------------
> > > data1 = { ( 'a1', { ('a2-thing1'), ('a2-thing2') } ) }
> > > data2 = { ( 'a2-thing1', 'x-value1' ), ( 'a2-thing1', 'x-value2' ) }
> > >
> > > Here's what we want to get:
> > > --------------------------------------
> > > ( 'a1', { ('a2-thing1', {
> > > ('x-value1'), ('x-value2') }
> > > ) }
> > > )
> > >
> > > Notice that we are trying to join the collection of a2 fields of the
> 1st
> > > data set, on the first field in the 2nd data set.
> > >
> > > We tried this:
> > > --------------------
> > > A = load 'data1' as ( a:tuple(a1:chararray, a2:bag{(a2t:chararray)}) );
> > > B = load 'data2' as ( a2t:chararray, x:chararray );
> > > X = join A by a2.a2t, B by a2t;
> > >
> > > We get this error:
> > > ---------------------------
> > > ERROR 1128: Cannot find field a2t in
> > > a1:chararray,a2:bag{:tuple(a2t:chararray)}
> > >
> > > Try as we might, we cannot find the right way to do this complex join.
> > > Questions:
> > > 1) Should we be simplifying our data format into a more SQL
> table-like
> > > structure and doing more joins to reduce the complexity?
> > > 2) How can we accomplish joining data2's data into the data1
> "objects"?
> > >
> > > --
> > > Ho Duc Ha
> > >
> >
>
>
>
> --
> Ho Duc Ha
>