Hi Arnaud,

the unmatched elements of A will only end up on the same worker node if
they all share the same key. Otherwise, they will be evenly spread out
across your cluster. However, I would also recommend you to use Flink's
leftOuterJoin.

Cheers,
Till

On Sun, Jan 31, 2016 at 5:27 AM, Chiwan Park <chiwanp...@apache.org> wrote:

> Hi Arnaud,
>
> To join two datasets, the community recommends using join operation rather
> than cogroup operation. For left join, you can use leftOuterJoin method.
> Flink’s optimizer decides distributed join execution strategy using some
> statistics of the datasets such as size of the dataset. Additionally, you
> can set join hint to help optimizer decide the strategy.
>
> In transformations section [1] of Flink documentation, you can find about
> outer join operation in detail.
>
> I hope this helps.
>
> [1]:
> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/programming_guide.html#transformations
>
> Regards,
> Chiwan Park
>
> > On Jan 30, 2016, at 6:43 PM, LINZ, Arnaud <al...@bouyguestelecom.fr>
> wrote:
> >
> > Hello,
> >
> > I have a very big dataset A to left join with a dataset B that is half
> its size. That is to say, half of A records will be matched with one record
> of B, and the other half with null values.
> >
> > I used a CoGroup for that, but my batch fails because yarn kills the
> container due to memory problems.
> >
> > I guess that’s because one worker will get half of A dataset (the
> unmatched ones), and that’s too much for a single JVM
> >
> > Am I right in my diagnostic ? Is there a better way to left join
> unbalanced datasets ?
> >
> > Best regards,
> >
> > Arnaud
> >
> >
> >
> > L'intégrité de ce message n'étant pas assurée sur internet, la société
> expéditrice ne peut être tenue responsable de son contenu ni de ses pièces
> jointes. Toute utilisation ou diffusion non autorisée est interdite. Si
> vous n'êtes pas destinataire de ce message, merci de le détruire et
> d'avertir l'expéditeur.
> >
> > The integrity of this message cannot be guaranteed on the Internet. The
> company that sent this message cannot therefore be held liable for its
> content nor attachments. Any unauthorized use or dissemination is
> prohibited. If you are not the intended recipient of this message, then
> please delete it and notify the sender.
>
>

Reply via email to