You can always shuffle the stream generated by the Kafka source
(dataStream.shuffle()) to evenly distribute records downstream.

On Fri, Oct 26, 2018 at 2:08 AM gerardg <ger...@talaia.io> wrote:

> Hi,
>
> We are experience issues scaling our Flink application and we have observed
> that it may be because Kafka messages consumption is not balanced across
> partitions. The attached image (lag per partition) shows how only one
> partition consumes messages (the blue one in the back) and it wasn't until
> it finished that the other ones started to consume at a good rate (actually
> the total throughput multiplied by 4 when these started) . Also, when that
> ones started to consume, one partition just stopped an accumulated messages
> back again until they finished.
>
> We don't see any resource (CPU, network, disk..) struggling in our cluster
> so we are not sure what could be causing this behavior. I can only assume
> that somehow Flink or the Kafka consumer is artificially slowing down the
> other partitions. Maybe due to how back pressure is handled?
>
> <
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1007/consumer_max_lag.png>
>
>
> Gerard
>
>
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Reply via email to