Hi Avi,

I'd definitely go for approach #1.
Flink will hash partition the records across all nodes. This is basically
the same as a distributed key-value store sharding keys.
I would not try to fine tune the partitioning. You should try to use as
many keys as possible to ensure an even distribution of key. This will also
allow to scale your application later to more tasks.

Best, Fabian

Am Di., 27. Nov. 2018 um 05:21 Uhr schrieb yinhua.dai <
yinhua.2...@outlook.com>:

> General approach#1 is ok, but you may have to use some hash based key
> selector if you have a heavy data skew.
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Reply via email to