Hi Arnaud, the unmatched elements of A will only end up on the same worker node if they all share the same key. Otherwise, they will be evenly spread out across your cluster. However, I would also recommend you to use Flink's leftOuterJoin.
Cheers, Till On Sun, Jan 31, 2016 at 5:27 AM, Chiwan Park <chiwanp...@apache.org> wrote: > Hi Arnaud, > > To join two datasets, the community recommends using join operation rather > than cogroup operation. For left join, you can use leftOuterJoin method. > Flink’s optimizer decides distributed join execution strategy using some > statistics of the datasets such as size of the dataset. Additionally, you > can set join hint to help optimizer decide the strategy. > > In transformations section [1] of Flink documentation, you can find about > outer join operation in detail. > > I hope this helps. > > [1]: > https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/programming_guide.html#transformations > > Regards, > Chiwan Park > > > On Jan 30, 2016, at 6:43 PM, LINZ, Arnaud <al...@bouyguestelecom.fr> > wrote: > > > > Hello, > > > > I have a very big dataset A to left join with a dataset B that is half > its size. That is to say, half of A records will be matched with one record > of B, and the other half with null values. > > > > I used a CoGroup for that, but my batch fails because yarn kills the > container due to memory problems. > > > > I guess that’s because one worker will get half of A dataset (the > unmatched ones), and that’s too much for a single JVM > > > > Am I right in my diagnostic ? Is there a better way to left join > unbalanced datasets ? > > > > Best regards, > > > > Arnaud > > > > > > > > L'intégrité de ce message n'étant pas assurée sur internet, la société > expéditrice ne peut être tenue responsable de son contenu ni de ses pièces > jointes. Toute utilisation ou diffusion non autorisée est interdite. Si > vous n'êtes pas destinataire de ce message, merci de le détruire et > d'avertir l'expéditeur. > > > > The integrity of this message cannot be guaranteed on the Internet. The > company that sent this message cannot therefore be held liable for its > content nor attachments. Any unauthorized use or dissemination is > prohibited. If you are not the intended recipient of this message, then > please delete it and notify the sender. > >