Hi! YARN killing the application seems strange. The memory use that YARN sees should not change even when one node gets a lot or data.
Can you share what version of Flink (plus commit hash) you are using and whether you use off-heap memory or not? Thanks, Stephan On Sun, Jan 31, 2016 at 10:47 AM, Till Rohrmann <trohrm...@apache.org> wrote: > Hi Arnaud, > > the unmatched elements of A will only end up on the same worker node if > they all share the same key. Otherwise, they will be evenly spread out > across your cluster. However, I would also recommend you to use Flink's > leftOuterJoin. > > Cheers, > Till > > On Sun, Jan 31, 2016 at 5:27 AM, Chiwan Park <chiwanp...@apache.org> > wrote: > >> Hi Arnaud, >> >> To join two datasets, the community recommends using join operation >> rather than cogroup operation. For left join, you can use leftOuterJoin >> method. Flink’s optimizer decides distributed join execution strategy using >> some statistics of the datasets such as size of the dataset. Additionally, >> you can set join hint to help optimizer decide the strategy. >> >> In transformations section [1] of Flink documentation, you can find about >> outer join operation in detail. >> >> I hope this helps. >> >> [1]: >> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/programming_guide.html#transformations >> >> Regards, >> Chiwan Park >> >> > On Jan 30, 2016, at 6:43 PM, LINZ, Arnaud <al...@bouyguestelecom.fr> >> wrote: >> > >> > Hello, >> > >> > I have a very big dataset A to left join with a dataset B that is half >> its size. That is to say, half of A records will be matched with one record >> of B, and the other half with null values. >> > >> > I used a CoGroup for that, but my batch fails because yarn kills the >> container due to memory problems. >> > >> > I guess that’s because one worker will get half of A dataset (the >> unmatched ones), and that’s too much for a single JVM >> > >> > Am I right in my diagnostic ? Is there a better way to left join >> unbalanced datasets ? >> > >> > Best regards, >> > >> > Arnaud >> > >> > >> > >> > L'intégrité de ce message n'étant pas assurée sur internet, la société >> expéditrice ne peut être tenue responsable de son contenu ni de ses pièces >> jointes. Toute utilisation ou diffusion non autorisée est interdite. Si >> vous n'êtes pas destinataire de ce message, merci de le détruire et >> d'avertir l'expéditeur. >> > >> > The integrity of this message cannot be guaranteed on the Internet. The >> company that sent this message cannot therefore be held liable for its >> content nor attachments. Any unauthorized use or dissemination is >> prohibited. If you are not the intended recipient of this message, then >> please delete it and notify the sender. >> >> >