Re: Increase in parallelism has very bad impact on performance

2020-11-04 Thread Arvid Heise
on't think > it's very probable for 2 different countries to have the same hash, but I > know for a fact that the number of events is not evenly distributed between > countries. > > But still, why does the impact in performance appear only for higher > parallelism? > > &

Re: Increase in parallelism has very bad impact on performance

2020-11-04 Thread Sidney Feiner
d Heise Sent: Tuesday, November 3, 2020 8:54 PM To: Sidney Feiner Cc: Yangze Guo ; user@flink.apache.org Subject: Re: Increase in parallelism has very bad impact on performance Hi Sidney, you might recheck your first message. Either it's incorrectly written or you are a victim of a fa

Re: Increase in parallelism has very bad impact on performance

2020-11-03 Thread Arvid Heise
ormance appear only for higher > parallelism? > > > *Sidney Feiner* */* Data Platform Developer > M: +972.528197720 */* Skype: sidney.feiner.startapp > > [image: emailsignature] > > ---------- > *From:* Arvid Heise > *Sent:* Tuesday, November 3, 2020 12:09 PM >

Re: Increase in parallelism has very bad impact on performance

2020-11-03 Thread Sidney Feiner
angze Guo Cc: Sidney Feiner ; user@flink.apache.org Subject: Re: Increase in parallelism has very bad impact on performance Hi Sidney, there could be a couple of reasons where scaling actually hurts. Let's include them one by one. First, you need to make sure that your source actually s

Re: Increase in parallelism has very bad impact on performance

2020-11-03 Thread Sidney Feiner
: sidney.feiner.startapp [emailsignature] From: Yangze Guo Sent: Tuesday, November 3, 2020 5:00 AM To: Sidney Feiner Cc: user@flink.apache.org Subject: Re: Increase in parallelism has very bad impact on performance Hi, Sidney, What is the data generation rate of your

Re: Increase in parallelism has very bad impact on performance

2020-11-03 Thread Arvid Heise
Hi Sidney, there could be a couple of reasons where scaling actually hurts. Let's include them one by one. First, you need to make sure that your source actually supports scaling. Thus, your Kafka topic needs at least as many partitions as you want to scale. So if you want to scale at some point

Re: Increase in parallelism has very bad impact on performance

2020-11-02 Thread Yangze Guo
Hi, Sidney, What is the data generation rate of your Kafka topic? Is it a lot bigger than 6000? Best, Yangze Guo Best, Yangze Guo On Tue, Nov 3, 2020 at 8:45 AM Sidney Feiner wrote: > > Hey, > I'm writing a Flink app that does some transformation on an event consumed > from Kafka and then cr

Increase in parallelism has very bad impact on performance

2020-11-02 Thread Sidney Feiner
Hey, I'm writing a Flink app that does some transformation on an event consumed from Kafka and then creates time windows keyed by some field, and apply an aggregation on all those events. When I run it with parallelism 1, I get a throughput of around 1.6K events per second (so also 1.6K events p