A replicated cross (implemented as a replicated join on a synthetic key) is
probably your best bet.


On Wed, Oct 23, 2013 at 2:09 PM, Daniel Dai <[email protected]> wrote:

> Can you do a cross?
>
>
> On Mon, Oct 21, 2013 at 2:21 PM, Serega Sheypak <[email protected]
> >wrote:
>
> > Hi, I have two relations:
> > relation *rows* (>10GB)
> > relation *tinyDictionary* (<1MB)
> >
> > I want to take each tuple from *rows* and attach *tinyDictionary *to it.
> > And then pass it to python UDF:
> >
> > result = FOREACH someRelation GENERATE
> udf.my_python_udf(single_row_from_*
> > Rows*, whole*TinyDictionary*);
> >
> > How can I do that?
> >
> > There is a solution to do it using DistirbutedCache, but I would like to
> > avoid to use Java stuff.
> > Also *TinyDictionary *could be in several files. It would be hard to deal
> > with it.
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Reply via email to