Flink SQL and data shuffling (keyBy)

Yaroslav Tkachenko Wed, 30 Mar 2022 21:18:23 -0700

Hey everyone,

I'm trying to use Flink SQL to construct a set of transformations for my
application. Let's say the topology just has three steps:


- SQL Source
- SQL SELECT statement
- SQL Sink (via INSERT)

The sink I'm using (JDBC) would really benefit from data partitioning (by
PK ID) to avoid conflicting transactions and deadlocks. I can force Flink
to partition the data by the PK ID before the INSERT by resorting to
DataStream API and leveraging the keyBy method, then transforming
DataStream back to the Table again...

Is there a simpler way to do this? I understand that, for example, a GROUP
BY statement will probably perform similar data shuffling, but what if I
have a simple SELECT followed by INSERT?

Thank you!

Flink SQL and data shuffling (keyBy)

Reply via email to