Hi,
 If I understand your problem correctly, there is a similar JIRA 
issueFLINK-10348, reported by me. Maybe you can take a look at it.


Jiayi Liao,Best


Original Message
Sender:Gerard garciager...@talaia.io
Recipient:fearsome.lucidityfearsome.lucid...@gmail.com
Cc:useru...@flink.apache.org
Date:Monday, Oct 29, 2018 17:50
Subject:Re: Unbalanced Kafka consumer consumption


The stream is partitioned by key after ingestion at the finest granularity that 
we can (which is finer than how stream is partitioned when produced to kafka). 
It is not perfectly balanced but still is not so unbalanced to show this 
behavior (more balanced than what the lag images show).


Anyway, let's assume that the problem is that the stream is so unbalanced that 
one operator subtask can't handle the ingestion rate. It is expected then that 
all the others operators reduce its ingestion rate even if they have resources 
to spare? The task is configured with processing time and there are no windows. 
If that is the case, is there a way to let operator subtasks process freely 
even if one of them is causing back pressure upstream?


The attached images shows how Kafka lag increases while thethroughput is stable 
until some operator subtasks finish.


Thanks,


Gerard


On Fri, Oct 26, 2018 at 8:09 PM Elias Levy fearsome.lucid...@gmail.com wrote:

You can always shuffle the stream generated by the Kafka source 
(dataStream.shuffle())to evenly distribute records downstream.


On Fri, Oct 26, 2018 at 2:08 AM gerardg ger...@talaia.io wrote:

Hi,
 
 We are experience issues scaling our Flink application and we have observed
 that it may be because Kafka messages consumption is not balanced across
 partitions. The attached image (lag per partition) shows how only one
 partition consumes messages (the blue one in the back) and it wasn't until
 it finished that the other ones started to consume at a good rate (actually
 the total throughput multiplied by 4 when these started) . Also, when that
 ones started to consume, one partition just stopped an accumulated messages
 back again until they finished.
 
 We don't see any resource (CPU, network, disk..) struggling in our cluster
 so we are not sure what could be causing this behavior. I can only assume
 that somehow Flink or the Kafka consumer is artificially slowing down the
 other partitions. Maybe due to how back pressure is handled?
 
 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1007/consumer_max_lag.png
 
 
 Gerard
 
 
 
 
 
 --
 Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to