I was under the impression that for Bag->Bag functions, providing the schema
made things much faster?

2011/1/10 Dmitriy Ryaboy <[email protected]>

> Heck, if you know the schema at runtime, you could pass in a string
> describing the schema as another argument.
> Or pass it in during initialization:
>
> define udfWithSchema myUdf('a:int, b:chararrahy')
>
> What do you need the schema for, exactly?
>
> D
>
> On Mon, Jan 10, 2011 at 10:36 AM, Jonathan Coveney <[email protected]
> >wrote:
>
> > I thought about that, but I do not know how long the tuple is. This isn't
> > an
> > issue from a calculation perspective, I suppose, as long as you make sure
> > that prop is the first thing in the bag. But from a schema...hmm, I guess
> > you could just grab the schema of the other elements and build it
> > accordingly?
> >
> > 2011/1/10 Dmitriy Ryaboy <[email protected]>
> >
> > > Jonathan, can't you just pass the bag A in?
> > >
> > > On Mon, Jan 10, 2011 at 9:56 AM, Jonathan Coveney <[email protected]
> > > >wrote:
> > >
> > > > So I have a udf, let's call it myudf.bag2bag, which takes a bag which
> > > > contains "prop," and creates a new bag of tuples based on that.
> > > >
> > > > I have data in the form of
> > > >
> > > > id    prop    other1    other2
> > > >
> > > > If all I care about is running the udf, obviously I can do
> > > >
> > > > A = LOAD 'file' AS (id, prop, other1, other2);
> > > > B = GROUP A BY id;
> > > > C = FOREACH B GENERATE group, FLATTEN(myudf.bag2bag(A.prop));
> > > >
> > > > And all is fine
> > > >
> > > > But what do I do if I want to hold on to the other data, especially
> if
> > > you
> > > > don't know how much there will be (from a bag2bag perspective)
> > > >
> > > > My thought is that in bag2bag, you can pass in a touple of "extras,"
> > > which
> > > > you then pass back, ie
> > > >
> > > > C = FOREACH B GENERATE group, FLATTEN(myudf.bag2bag(A.prop,
> (A,other1,
> > > > A.other2))));
> > > >
> > > > I'm just not sure how I would specify the schema for this, in such a
> > way
> > > > that any number of entries could be in the tuple, and then you could
> > just
> > > > sort of reference them later.
> > > >
> > > > Is this possible?
> > > >
> > >
> >
>

Reply via email to