We changed the load statement to:
X = load 'data3' using PigStorage() as ( a:chararray, b:bag{(c:chararray)}
);
But we get the same results with your statement:
Y = FOREACH X GENERATE b;
dump Y;
output (of above command)
-----------------------------------------
()
What we really want to create is a set of the tuples in the bag b
('5'),('6')
Another example which seems to fail to load properly is this (using ints
instead of strings):
file: data4
-------------
( 3, {(5),(6)} )
X1 = load 'data4' using PigStorage() as ( a:int, b:bag{(c:int)} );
dump X1;
result:
---------
(,)
We also tried formatting the data like this, with the extra tuple around it
like I see in the output often, no luck:
((3, {(5),(6)} ))
On Wed, May 22, 2013 at 11:32 PM, Sergey Goder <[email protected]>wrote:
> Looks like you're probably not reading the data in correctly. Perhaps you
> need to specify the USING PigStorage() syntax and specify the correct
> delimiter as an argument.
>
> Also, if you want Y to just be the bag then you can just write it as;
>
> Y = FOREACH X GENERATE b;
>
>
> On Wed, May 22, 2013 at 8:51 AM, Ho Duc Ha <[email protected]> wrote:
>
> > Actually I think you're right, the process in map/reduce isn't so
> > different.
> >
> > However, after trying to do this, we can't understand the output we see
> > below. We expected to see only '3' in alias Z, and '5' and '6' in alias
> Y,
> > neither result was as expected.
> >
> > X = load 'data3' as ( a:chararray, b:bag{(c:chararray)} );
> > Y = foreach X { W = foreach b generate *; generate W; };
> > Z = foreach X generate a;
> >
> > data3
> > ( '3', {( '5' ),('6')} )
> >
> > dump X
> > (( '3', {( '5' ),('6')} ),)
> >
> > dump Y
> > ({})
> >
> > dump Z
> > (( '3', {( '5' ),('6')} ))
> >
> >
> >
> >
> > On Wed, May 22, 2013 at 8:25 PM, Pradeep Gollakota <[email protected]
> > >wrote:
> >
> > > Hi All,
> > >
> > > I'm a beginner pig user and this is my first post to the Pig mailing
> > list.
> > >
> > > Anyway, to answer your question, the first thing that comes to my mind
> is
> > > that Pig may not be able to do a complex join like that.
> > >
> > > However, you can first flatten the bag in A, then do your join and then
> > do
> > > a group by do get the result in the format you are looking for. This
> may
> > > not be an idea solution, but it should work.
> > >
> > > Pradeep
> > >
> > >
> > > On Wed, May 22, 2013 at 8:49 AM, Ho Duc Ha <[email protected]> wrote:
> > >
> > > > We've got a data type that is modeled after a typical object-oriented
> > > > data-model format (simple fields, and collections of other objects).
> > > We're
> > > > trying to accomplish the following join:
> > > >
> > > > Here's out example input:
> > > > -------------------------------------
> > > > data1 = { ( 'a1', { ('a2-thing1'), ('a2-thing2') } ) }
> > > > data2 = { ( 'a2-thing1', 'x-value1' ), ( 'a2-thing1', 'x-value2' )
> }
> > > >
> > > > Here's what we want to get:
> > > > --------------------------------------
> > > > ( 'a1', { ('a2-thing1', {
> > > > ('x-value1'), ('x-value2') }
> > > > ) }
> > > > )
> > > >
> > > > Notice that we are trying to join the collection of a2 fields of the
> > 1st
> > > > data set, on the first field in the 2nd data set.
> > > >
> > > > We tried this:
> > > > --------------------
> > > > A = load 'data1' as ( a:tuple(a1:chararray, a2:bag{(a2t:chararray)})
> );
> > > > B = load 'data2' as ( a2t:chararray, x:chararray );
> > > > X = join A by a2.a2t, B by a2t;
> > > >
> > > > We get this error:
> > > > ---------------------------
> > > > ERROR 1128: Cannot find field a2t in
> > > > a1:chararray,a2:bag{:tuple(a2t:chararray)}
> > > >
> > > > Try as we might, we cannot find the right way to do this complex
> join.
> > > > Questions:
> > > > 1) Should we be simplifying our data format into a more SQL
> > table-like
> > > > structure and doing more joins to reduce the complexity?
> > > > 2) How can we accomplish joining data2's data into the data1
> > "objects"?
> > > >
> > > > --
> > > > Ho Duc Ha
> > > >
> > >
> >
> >
> >
> > --
> > Ho Duc Ha
> >
>
--
Ho Duc Ha