Re: Skewed Data when joining tables using Flink SQL

2022-01-07 Thread Anne Lai
t; Best, > Yun > > > [1] https://issues.apache.org/jira/browse/FLINK-20491 > > > ------Original Mail -- > *Sender:*Anne Lai > *Send Date:*Fri Jan 7 17:20:58 2022 > *Recipients:*User > *Subject:*Skewed Data when joining tables using Flink SQL > >

Skewed Data when joining tables using Flink SQL

2022-01-07 Thread Anne Lai
Hi, I have a Flink batch job that needs to join a large skewed table with a smaller table, and because records are not evenly distributed to each subtask, it always fails with a "too much data in partition" error. I explored using DataStream API to broadcast the smaller tables as a broadcast state