A replicated cross (implemented as a replicated join on a synthetic key) is probably your best bet.
On Wed, Oct 23, 2013 at 2:09 PM, Daniel Dai <[email protected]> wrote: > Can you do a cross? > > > On Mon, Oct 21, 2013 at 2:21 PM, Serega Sheypak <[email protected] > >wrote: > > > Hi, I have two relations: > > relation *rows* (>10GB) > > relation *tinyDictionary* (<1MB) > > > > I want to take each tuple from *rows* and attach *tinyDictionary *to it. > > And then pass it to python UDF: > > > > result = FOREACH someRelation GENERATE > udf.my_python_udf(single_row_from_* > > Rows*, whole*TinyDictionary*); > > > > How can I do that? > > > > There is a solution to do it using DistirbutedCache, but I would like to > > avoid to use Java stuff. > > Also *TinyDictionary *could be in several files. It would be hard to deal > > with it. > > > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. >
