Here is how to use rank and join for this problem:
sh cat xxx
1,2,3,4,5
1,2,4,5,7
1,5,7,8,9
sh cat yyy
10,11
10,12
10,13
a= load 'xxx' using PigStorage(',');
b= load 'yyy' using PigStorage(',');
a2 = rank a;
b2 = rank b;
c = join a1 by $0, b2 by $0;
c2 = order c by $6;
c3 = foreach c2 generate $1 .. $5, $7 ..;
dump c3
(1,2,3,4,5,10,11)
(1,2,4,5,7,10,12)
(1,5,7,8,9,10,13)
William F Dowling
Senior Technologist
Thomson Reuters
-----Original Message-----
From: Christopher Surage [mailto:[email protected]]
Sent: Tuesday, March 25, 2014 4:03 PM
To: [email protected]
Subject: Re: Any way to join two aliases without using CROSS
The output I would like to see is
(1,2,3,4,5,10,11)
(1,2,4,5,7,10,12)
(1,5,7,8,9,10,13)
On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota <[email protected]>wrote:
> I don't understand what you're trying to do from your example.
>
> If you perform a cross on the data you have, the output will be the
> following:
>
> (1,2,3,4,5,10,11)
> (1,2,3,4,5,10,11)
> (1,2,3,4,5,10,11)
> (1,2,4,5,7,10,11)
> (1,2,4,5,7,10,11)
> (1,2,4,5,7,10,11)
> (1,5,7,8,9,10,11)
> (1,5,7,8,9,10,11)
> (1,5,7,8,9,10,11)
>
> On this, you'll have to do a distinct to get what you're looking for.
>
> Let's change the example a little bit so we get a more clear understanding
> of your problem. What would be the output if your two relations looked as
> follows:
>
> (1,2,3,4,5) (10,11)
> (1,2,4,5,7) (10,12)
> (1,5,7,8,9) (10,13)
>
>
> On Tue, Mar 25, 2014 at 12:18 PM, Shahab Yunus <[email protected]
> >wrote:
>
> > Have you tried iterating over the first relation and in the nested
> > *generate* clause, always appending the second relation? Your top level
> > looping is on first relation but in the nested block you are sort of
> > hardcoding appending of second relation.
> >
> > I am referring to the examples like in "Example: Nested Blocks" section
> > http://pig.apache.org/docs/r0.10.0/basic.html#foreach
> >
> >
> > On Tue, Mar 25, 2014 at 3:01 PM, Christopher Surage <[email protected]
> > >wrote:
> >
> > > I am trying to perform the following action, but the only solution I
> have
> > > been able to come up with is using a CROSS, but I don't want to use
> that
> > > statement as it is a very expensive process.
> > >
> > > (1,2,3,4,5) (10,11)
> > > (1,2,4,5,7) (10,11)
> > > (1,5,7,8,9) (10,11)
> > >
> > >
> > > I want to make it
> > > (1,2,3,4,5,10,11)
> > > (1,2,4,5,7,10,11)
> > > (1,5,7,8,9,10,11)
> > >
> > > any help would be much appreciated,
> > >
> > > Chris
> > >
> >
>