I don’t think there are any particular implications. I would suggest to go for
a simple keyBy and think about optimization if there should actually be a
problem at hand.
Best,
Stefan
> Am 03.04.2018 um 17:08 schrieb Timo Walther :
>
> @Richter: Are you aware of any per-key state size performan
@Richter: Are you aware of any per-key state size performance implications?
Am 03.04.18 um 16:56 schrieb au.fp2018:
Thanks Timo/LiYue, your responses were helpful.
I was worried about the network shuffle with the second keyBy. The first
keyBy is indeed evenly spreading the load across the node
Thanks Timo/LiYue, your responses were helpful.
I was worried about the network shuffle with the second keyBy. The first
keyBy is indeed evenly spreading the load across the nodes. As I mentioned
my concern was around the amount of state in each key. Maybe I am trying to
optimize pre-maturely here
Hi Andre,
every keyBy is a shuffle over the network and thus introduces some
overhead. Esp. serialization of records between operators if object
reuse is disabled by default. If you think that not all slots (and thus
all nodes) are not fully occupied evenly in the first keyBy operation
(e.g.
Hello,
In my opinion , it would be meaningful only on this situation:
1. The total size of all your stats is huge enough, e.g. 1GB+.
2. Splitting you job to multiple KeyBy process would reduce the size of your
stats.
Because operation of saving stats is synchronized and all working threa