Re: Flink SQL and data shuffling (keyBy)

Marios Trivyzas Mon, 04 Apr 2022 02:42:56 -0700

Hi again,

Maybe you can use the
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/#table-exec-sink-keyed-shuffle
*table.exec.sink.keyed-shuffle* and set it to *FORCE, *which will use the
primary key column(s) to partition and distribute the data.


On Fri, Apr 1, 2022 at 6:52 PM Marios Trivyzas <mat...@gmail.com> wrote:

> Hi!
>
> I don't think there is a way to achieve that without resorting to
> DataStream API.
> I don't know if using the PARTITIONED BY clause in the create statement of
> the table can help to "balance" the data, see
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/#partitioned-by
> .
>
>
> On Thu, Mar 31, 2022 at 7:18 AM Yaroslav Tkachenko <yaros...@goldsky.io>
> wrote:
>
>> Hey everyone,
>>
>> I'm trying to use Flink SQL to construct a set of transformations for my
>> application. Let's say the topology just has three steps:
>>
>> - SQL Source
>> - SQL SELECT statement
>> - SQL Sink (via INSERT)
>>
>> The sink I'm using (JDBC) would really benefit from data partitioning (by
>> PK ID) to avoid conflicting transactions and deadlocks. I can force Flink
>> to partition the data by the PK ID before the INSERT by resorting to
>> DataStream API and leveraging the keyBy method, then transforming
>> DataStream back to the Table again...
>>
>> Is there a simpler way to do this? I understand that, for example, a
>> GROUP BY statement will probably perform similar data shuffling, but what
>> if I have a simple SELECT followed by INSERT?
>>
>> Thank you!
>>
>
>
> --
> Marios
>


Best,
Marios

Re: Flink SQL and data shuffling (keyBy)

Reply via email to