Hi Aljoscha, The latency is measured with Flink MetricGroup (specifically with "DropwizardHistogram"). The latency is measured from message read time (i.e. from when the message is pulled from Kafka source) until the last operator completes the processing(there is no Kafka sink).
Thanks, Liron From: Aljoscha Krettek [mailto:aljos...@apache.org] Sent: Wednesday, January 03, 2018 3:03 PM To: Netzer, Liron [ICG-IT] Cc: user@flink.apache.org Subject: Re: Lower Parallelism derives better latency Hi, How are you measuring latency? Is it latency within a Flink Job or from Kafka to Kafka? The first seems more likely but I'm still interested in the details. Best, Aljoscha On 3. Jan 2018, at 08:13, Netzer, Liron <liron.net...@citi.com<mailto:liron.net...@citi.com>> wrote: Hi group, We have a standalone Flink cluster that is running on a UNIX host with 40 CPUs and 256GB RAM. There is one task manager and 24 slots were defined. When we decrease the parallelism of the Stream graph operators(each operator has the same parallelism), we see a consistent change in the latency, it gets better: Test run Parallelism 99 percentile 95 percentile 75 percentile Mean #1 8 4.15 ms 2.02 ms 0.22 ms 0.42 ms #2 7 3.6 ms 1.68 ms 0.14 ms 0.34 ms #3 6 3 ms 1.4 ms 0.13 ms 0.3 ms #4 5 2.1 ms 0.95 ms 0.1 ms 0.22 ms #5 4 1.5 ms 0.64 ms 0.09 ms 0.16 ms This was a surprise for us, as we expected that higher parallelism will derive better latency. Could you try to assist us to understand this behavior? I know that when there are more threads that are involved, there is probably more serialization/deserialization, but this can't be the only reason for this behavior. We have two Kafka sources, and the rest of the operators are fixed windows, flatmaps, coMappers and several KeyBys. Except for the Kafka sources and some internal logging, there is no other I/O (i.e. we do not connect to any external DB/queue) We use Flink 1.3. Thanks, Liron