Re: Left join with unbalanced dataset

2016-01-30 Thread Chiwan Park
Hi Arnaud, To join two datasets, the community recommends using join operation rather than cogroup operation. For left join, you can use leftOuterJoin method. Flink’s optimizer decides distributed join execution strategy using some statistics of the datasets such as size of the dataset. Additio

Left join with unbalanced dataset

2016-01-30 Thread LINZ, Arnaud
Hello, I have a very big dataset A to left join with a dataset B that is half its size. That is to say, half of A records will be matched with one record of B, and the other half with null values. I used a CoGroup for that, but my batch fails because yarn kills the container due to memory prob