Re: Left join with unbalanced dataset

2016-01-30 Thread Chiwan Park
Hi Arnaud, To join two datasets, the community recommends using join operation rather than cogroup operation. For left join, you can use leftOuterJoin method. Flinkā€™s optimizer decides distributed join execution strategy using some statistics of the datasets such as size of the dataset. Additio

Left join with unbalanced dataset

2016-01-30 Thread LINZ, Arnaud
Hello, I have a very big dataset A to left join with a dataset B that is half its size. That is to say, half of A records will be matched with one record of B, and the other half with null values. I used a CoGroup for that, but my batch fails because yarn kills the container due to memory prob