I‘m so sorry for my poor english. what I really means is my broker machine is configured as 8 core 16G. but my jvm configure is as below. java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/xx/yy/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=32 -XX:GCLogFileSize=10M -XX:+HeapDumpOnOutOfMemoryError.
we have 30+ clusters with this jvm configure, and are deployed on the machine which configured as 8 core 16G. compare to other clusters, the current cluster have more than 5 times partitions than other clusters. when we restart other clusters, there is no such phenomenon. may be some metrics or logs can leads to find root cause of this phenomenon. Looking forward to more suggestions. > 在 2017年11月9日,下午9:59,John Yost <hokiege...@gmail.com> 写道: > > I've seen this before and it was due to long GC pauses due in large part to > a memory heap > 8 GB. > > --John > > On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <kafka...@126.com> wrote: > >> Hi, >> we have a kafka cluster which is made of 6 brokers, with 8 cpu and >> 16G memory on each broker’s machine, and we have about 1600 topics in the >> cluster,about 1700 partitions’ leader and 1600 partitions' replica on each >> broker. >> when we restart a normal broke, we find that there are 500+ >> partitions shrink and expand frequently when restart the broker, >> there are many logs as below. >> >> [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726: >> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750 >> (kafka.cluster.Partition) >> [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726: >> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726 >> (kafka.cluster.Partition) >> [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726: >> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750 >> (kafka.cluster.Partition) >> [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726: >> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726 >> (kafka.cluster.Partition) >> [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726: >> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750 >> (kafka.cluster.Partition) >> [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726: >> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726 >> (kafka.cluster.Partition) >> [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726: >> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750 >> (kafka.cluster.Partition) >> … >> >> >> and repeat shrink and expand after 30 minutes which is the default >> value of leader.imbalance.check.interval.seconds, and at that time >> we can find the log of controller’s auto rebalance,which can leads some >> partition’s leader change to this restarted broker. >> we have no shrink and expand when our cluster is running except when >> we restart it,so replica.fetch.thread.num is 1,and it seems enough. >> >> we can reproduce it at each restart,can someone give some suggestions. >> thanks before. >> >> >> >> >> >> >> >>