Hi, How are you measuring latency? Is it latency within a Flink Job or from Kafka to Kafka? The first seems more likely but I'm still interested in the details.
Best, Aljoscha > On 3. Jan 2018, at 08:13, Netzer, Liron <liron.net...@citi.com> wrote: > > Hi group, > > We have a standalone Flink cluster that is running on a UNIX host with 40 > CPUs and 256GB RAM. > There is one task manager and 24 slots were defined. > When we decrease the parallelism of the Stream graph operators(each operator > has the same parallelism), > we see a consistent change in the latency, it gets better: > Test run > Parallelism > 99 percentile > 95 percentile > 75 percentile > Mean > #1 > 8 > 4.15 ms > 2.02 ms > 0.22 ms > 0.42 ms > #2 > 7 > 3.6 ms > 1.68 ms > 0.14 ms > 0.34 ms > #3 > 6 > 3 ms > 1.4 ms > 0.13 ms > 0.3 ms > #4 > 5 > 2.1 ms > 0.95 ms > 0.1 ms > 0.22 ms > #5 > 4 > 1.5 ms > 0.64 ms > 0.09 ms > 0.16 ms > > This was a surprise for us, as we expected that higher parallelism will > derive better latency. > Could you try to assist us to understand this behavior? > I know that when there are more threads that are involved, there is probably > more serialization/deserialization, but this can't be the only reason for > this behavior. > > We have two Kafka sources, and the rest of the operators are fixed windows, > flatmaps, coMappers and several KeyBys. > Except for the Kafka sources and some internal logging, there is no other I/O > (i.e. we do not connect to any external DB/queue) > We use Flink 1.3. > > > Thanks, > Liron