I've seen this before and it was due to long GC pauses due in large part to
a memory heap > 8 GB.


On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <kafka...@126.com> wrote:

> Hi,
>     we have a kafka cluster which is made of 6 brokers,  with 8 cpu and
> 16G memory on each broker’s machine, and we have about 1600 topics in the
> cluster,about 1700 partitions’ leader and 1600 partitions' replica on each
> broker.
>     when we restart a normal broke,  we find that there are 500+
> partitions shrink and expand frequently when restart the broker,
> there are many logs as below.
>    [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726:
> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> (kafka.cluster.Partition)
> [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> (kafka.cluster.Partition)
> [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> (kafka.cluster.Partition)
> [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> (kafka.cluster.Partition)
> [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> (kafka.cluster.Partition)
> [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> (kafka.cluster.Partition)
> [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> (kafka.cluster.Partition)
> …
>     and repeat shrink and expand after 30 minutes which is the default
> value of leader.imbalance.check.interval.seconds, and at that time
> we can find the log of controller’s auto rebalance,which can leads some
> partition’s leader change to this restarted broker.
>     we have no shrink and expand when our cluster is running except when
> we restart it,so replica.fetch.thread.num is 1,and it seems enough.
>     we can reproduce it at each restart,can someone give some suggestions.
> thanks before.

Reply via email to