I was under the impression that for Bag->Bag functions, providing the schema made things much faster?
2011/1/10 Dmitriy Ryaboy <[email protected]> > Heck, if you know the schema at runtime, you could pass in a string > describing the schema as another argument. > Or pass it in during initialization: > > define udfWithSchema myUdf('a:int, b:chararrahy') > > What do you need the schema for, exactly? > > D > > On Mon, Jan 10, 2011 at 10:36 AM, Jonathan Coveney <[email protected] > >wrote: > > > I thought about that, but I do not know how long the tuple is. This isn't > > an > > issue from a calculation perspective, I suppose, as long as you make sure > > that prop is the first thing in the bag. But from a schema...hmm, I guess > > you could just grab the schema of the other elements and build it > > accordingly? > > > > 2011/1/10 Dmitriy Ryaboy <[email protected]> > > > > > Jonathan, can't you just pass the bag A in? > > > > > > On Mon, Jan 10, 2011 at 9:56 AM, Jonathan Coveney <[email protected] > > > >wrote: > > > > > > > So I have a udf, let's call it myudf.bag2bag, which takes a bag which > > > > contains "prop," and creates a new bag of tuples based on that. > > > > > > > > I have data in the form of > > > > > > > > id prop other1 other2 > > > > > > > > If all I care about is running the udf, obviously I can do > > > > > > > > A = LOAD 'file' AS (id, prop, other1, other2); > > > > B = GROUP A BY id; > > > > C = FOREACH B GENERATE group, FLATTEN(myudf.bag2bag(A.prop)); > > > > > > > > And all is fine > > > > > > > > But what do I do if I want to hold on to the other data, especially > if > > > you > > > > don't know how much there will be (from a bag2bag perspective) > > > > > > > > My thought is that in bag2bag, you can pass in a touple of "extras," > > > which > > > > you then pass back, ie > > > > > > > > C = FOREACH B GENERATE group, FLATTEN(myudf.bag2bag(A.prop, > (A,other1, > > > > A.other2)))); > > > > > > > > I'm just not sure how I would specify the schema for this, in such a > > way > > > > that any number of entries could be in the tuple, and then you could > > just > > > > sort of reference them later. > > > > > > > > Is this possible? > > > > > > > > > >
