Re: Flink Table SQL, Kafka, partitions and unnecessary shuffling

Dan Hill Wed, 16 Sep 2020 09:23:34 -0700

Hi Piotr!  Yes, that's what I'm using with DataStream.  It works well in my
prototype.


On Wed, Sep 16, 2020 at 8:58 AM Piotr Nowojski <pnowoj...@apache.org> wrote:

> Hi,
>
> Have you seen "Reinterpreting a pre-partitioned data stream as keyed
> stream" feature? [1] However I'm not sure if and how can it be integrated
> with the Table API. Maybe someone more familiar with the Table API can help
> with that?
>
> Piotrek
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/experimental.html#reinterpreting-a-pre-partitioned-data-stream-as-keyed-stream
>
> śr., 16 wrz 2020 o 05:35 Dan Hill <quietgol...@gmail.com> napisał(a):
>
>> How do I avoid unnecessary reshuffles when using Kafka as input?  My keys
>> in Kafka are ~userId.  The first few stages do joins that are usually
>> (userId, someOtherKeyId).  It makes sense for these joins to stay on the
>> same machine and avoid unnecessary shuffling.
>>
>> What's the best way to avoid unnecessary shuffling when using Table SQL
>> interface?  I see PARTITION BY on TABLE.  I'm not sure how to specify the
>> keys for Kafka.
>>
>>
>>
>>
>>

Re: Flink Table SQL, Kafka, partitions and unnecessary shuffling

Reply via email to