Hi Alan,
Thanks for your reply.
I am trying to understand how Pig processes these relations. As I mentioned, my
UDF returns the result in the following format;
{(id1,x,y,z), (id2, a, b, c), (id3,x,a)} /* User 1 info */
{(id10,x,y,z), (id9, a, b, c), (id1,x,a)} /* User 2 info */
{(id8,x,y,z), (id4, a, b, c), (id2,x,a)} /* User 3 info */
{(id6,x,y,z), (id6, a, b, c), (id9,x,a)} /* User 4 info */
B = foreach A { /* Each element in A is a bag. This will apply the following on
each element within A that is each bag. */ Is this correct?
B1 = order A by $0; -- order on the id /*What does this A refer to? Does it
refer to it to each Bag of relationship A ? I get the following error:
expression is not a project expression:
/* rest of the code */
}
Thanks for your help.
> Subject: Re: Bag of tuples
> From: [email protected]
> Date: Wed, 6 Nov 2013 09:36:04 -0800
> To: [email protected]
>
> Do you mean you want to find the top 5 per input record? Also, what is your
> ordering criteria? Just sort by id? Something like this should order all
> tuples in each bag by id and then produce the top 5. My syntax may be a
> little off as I'm working offline and don't have the manual in front of me,
> but this should be the general idea.
>
> A = load 'yourinput' as (b:bag);
> B = foreach A {
> B1 = order A by $0; -- order on the id
> B2 = limit B1 5;
> generate flatten(B2);
> }
>
> Alan.
>
> On Nov 5, 2013, at 9:52 AM, Sameer Tilak wrote:
>
> > Hi Pig experts,
> > Sorry to post so many questions, I have one more question on doing some
> > analytics on bag of tuples.
> >
> > My input has the following format:
> >
> > {(id1,x,y,z), (id2, a, b, c), (id3,x,a)} /* User 1 info */
> > {(id10,x,y,z), (id9, a, b, c), (id1,x,a)} /* User 2 info */
> > {(id8,x,y,z), (id4, a, b, c), (id2,x,a)} /* User 3 info */
> > {(id6,x,y,z), (id6, a, b, c), (id9,x,a)} /* User 4 info */
> >
> > I can change my UDF to give more simple output. However, I want to find out
> > if something like this can be done easily:
> > I would like to find out top 5 ids (field 1 in a tuple) among all the
> > users. Note that each user has a bag and the first field of each tuple in
> > that bag is id.
> >
> > How difficult will it be to filter based on fields of tuples and do
> > analytics across the entire user base.
> >
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.