Hi Avi, I'd definitely go for approach #1. Flink will hash partition the records across all nodes. This is basically the same as a distributed key-value store sharding keys. I would not try to fine tune the partitioning. You should try to use as many keys as possible to ensure an even distribution of key. This will also allow to scale your application later to more tasks.
Best, Fabian Am Di., 27. Nov. 2018 um 05:21 Uhr schrieb yinhua.dai < yinhua.2...@outlook.com>: > General approach#1 is ok, but you may have to use some hash based key > selector if you have a heavy data skew. > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >