On Fri, Jan 07, 2011 at 10:44:03AM -0800, Thejas M Nair wrote: > On 1/7/11 9:20 AM, "Kris Coward" <[email protected]> wrote: > > I've got an outer bag/relation consistig of a bunch of user information, > > one of the pieces of which is an inner bag of possible events for that > > user, and the value of those events, should they occur. Outside the bag, > > there are also a few data concerning whether specific events have > > already occurred. > > > > In another relation, I have the assortment of events grouped with the > > probability that any of them will occur. > > > > I'd like to generate expected values for each user, but know that I > > can't JOIN within a FOREACH block (or do a nested FOREACH). For a UDF, > > I vaguely recall some sort of constraint on nesting inner bags that > > would interfere with my ability to bundle the possible events bag with > > the actual events data into a single object that could be passed to a > > UDF that extends EvalFunc. > I can't think of any limitations that would prevent you from writing such an > udf. > You can pass the bag of events to the udf, and have the udf append the > probability information to tuples in the bag and return the new bag. I am > assuming that the even probability relation is small enough to be stored in > memory. > > > Am I misremembering something? Is there some other sort of clever > > trickery that I might be able to use to generate expected values if I'm > > not? (and if I am, is there something less hackish than a GROUP on a > > unique tuple element that I could use to load the desired values into a > > bag or tuple (or just plain pass the entire tuple to a UDF)? > > Is this the alternative solution you are trying to avoid ? - do a (foreach-) > flatten on the events bag of first relation, do a join (using 'replicated' > if the 2nd relation is small enough), and then do a group-by on user (id). > This will not involve writing a UDF, but it will have an additional reduce > phase for the group-by. If you use a udf that appends the information, it > will be a map-only job.
I'm not trying to avoid that solution at all.. FLATTENing the events bag and then reGROUPing it after the seems like it's probably the solution I was looking for (the bag had been ORDERed before, and some information was present in the ordering, but I can separate that information out so that it survives FLATTEN. Thanks, Kris -- Kris Coward http://unripe.melon.org/ GPG Fingerprint: 2BF3 957D 310A FEEC 4733 830E 21A4 05C7 1FEB 12B3
