Hi,

We are experience issues scaling our Flink application and we have observed
that it may be because Kafka messages consumption is not balanced across
partitions. The attached image (lag per partition) shows how only one
partition consumes messages (the blue one in the back) and it wasn't until
it finished that the other ones started to consume at a good rate (actually
the total throughput multiplied by 4 when these started) . Also, when that
ones started to consume, one partition just stopped an accumulated messages
back again until they finished.

We don't see any resource (CPU, network, disk..) struggling in our cluster
so we are not sure what could be causing this behavior. I can only assume
that somehow Flink or the Kafka consumer is artificially slowing down the
other partitions. Maybe due to how back pressure is handled?

<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1007/consumer_max_lag.png>
 

Gerard





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to