I ran into a similar problem where I had a relation (A) which was massive and another relation (B) which had exactly 1 record. I needed to do a cross product of these two relations, and the default implementation was very slow. I worked around it by generating a synthetic key myself and then used a replicated join to cross the two relations. It looked something like the following:
data1 = load 'data1'; # billions of records data2 = load 'data2'; # 1 record A = foreach data1 generate *, 1 as fake_key; B = foreach data2 generate *, 1 as fake_key; C = join B by fake_key, A by fake_key using 'replicated'; I looked around to see if Pig supported this out of the box, but I didn't find anything. Perhaps a replicated cross operator would be helpful for these type of problems. >From the O'Reilly book, this is what is said about the cross operator: "Pig does implement cross in a parallel fashion. It does this by generating a synthetic join key, replicating rows, and then doing the cross as a join." Since the cross product operator is already being performed as join under the hood, I wonder how difficult it would be to support different join strategies for cross. On Fri, May 24, 2013 at 12:21 PM, Mehmet Tepedelenlioglu < [email protected]> wrote: > Thanks, but is there a map-side cross? The usual cross seems to have a > bug. I sent an example of how to replicate this bug. > > On 5/24/13 9:15 AM, "Jonathan Coveney" <[email protected]> wrote: > > >You can do this, but pig has a CROSS keyword that you can use. > > > > > >2013/5/23 Mehmet Tepedelenlioglu <[email protected]> > > > >> Hi, > >> > >> I am using this: > >> > >> x = join a by 1, b by 1 using 'replicated'; > >> > >> with the hope that it generates some synthetic key '1' on both a and b > >>and > >> joins it on that key, thereby, in this case, doing a clean map side > >>cross > >> of > >> a and b with no schema changes (exactly the way a cross would work). It > >> seems to be working, but since I just tried it and it worked, I am not > >>sure > >> if there is anything in there I should be aware of. Does anyone know? > >> > >> Thanks, > >> > >> Mehmet > >> > >> > >> > > >
