Hi Thomas,
Try this:
data1 = LOAD '1.txt' USING PigStorage('|') AS (n:int,
B:bag{(m:int,s:chararray)});
data2 = FOREACH data1 GENERATE n, FLATTEN(B);
data3 = FILTER data2 BY B::m <= n;
data4 = GROUP data3 BY n;
data5 = FOREACH data4 {
data6 = ORDER data3 BY B::m DESC;
data7 = LIMIT data6 1;
GENERATE data7;
}
data8 = FOREACH data5 GENERATE FLATTEN(data7);
data9 = FOREACH data8 GENERATE n, B::s;
DUMP data9;
The input is:
4|{(1,abc),(2,cde),(5,efg)}
2|{(1,foo),(2,bar),(5,baz)}
7|{(1,bounce),(2,frotz),(5,trotz)}
The output is:
(2,bar)
(4,cde)
(7,trotz)
Thanks,
Cheolsoo
On Tue, Jan 22, 2013 at 8:24 AM, Thomas Bach
<[email protected]>wrote:
> On Tue, Jan 22, 2013 at 12:55:22PM +0100, Thomas Bach wrote:
> > Hi there,
> >
> > I have the following data
> >
> > 4 {(1,abc),(2,cde),(5,efg)}
> > 2 {(1,foo),(2,bar),(5,baz)}
> > 7 {(1,bounce),(2,frotz),(5,trotz)}
> >
> > what I finally want to achieve is a list of all strings related to the
> > largest number in the tuple that is less-equal the first number in
> > the row. i.e.:
> >
> > (4,cde)
> > (2,bar)
> > (5,trotz)
> >
>
> This should be
>
> (4,cde)
> (2,bar)
> (7,trotz)
>
> of course.
>
> Regards,
> Thomas Bach.
>