Hi, Recently, we ran into the `batch expired` error in several days, may be 3 or 5 days, there is not fixed frequency.
*A,* the error is: Exception Class : org.apache.kafka.common.errors.TimeoutException Error Message : Batch Expired *B*: server.log from kafka : [2016-12-18 20:45:32,371] INFO Partition [thl_raw,43] on broker 1002: Shrinking ISR for partition [thl_raw,43] from 1006,1001,1002 to 1002 (kafka.cluster.Partition) [2016-12-18 20:45:32,376] INFO Partition [HeartBit,6] on broker 1002: Shrinking ISR for partition [HeartBit,6] from 1005,1006,1002 to 1002 (kafka.cluster.Partition) [2016-12-18 20:45:32,378] INFO Partition [thl_raw,31] on broker 1002: Shrinking ISR for partition [thl_raw,31] from 1005,1004,1002 to 1002 (kafka.cluster.Partition) [2016-12-18 20:45:32,382] INFO Partition [HeartBit,0] on broker 1002: Shrinking ISR for partition [HeartBit,0] from 1004,1005,1002 to 1002 (kafka.cluster.Partition) [2016-12-18 20:45:32,384] INFO Partition [ConnectorSync,7] on broker 1002: Shrinking ISR for partition [ConnectorSync,7] from 1001,1002,1003 to 1002 (kafka.cluster.Partition) [2016-12-18 20:45:32,386] INFO Partition [__consumer_offsets,8] on broker 1002: Shrinking ISR for partition [__consumer_offsets,8] from 1005,1004,1002 to 1002 (kafka.cluster.Partition) [2016-12-18 20:45:32,389] INFO Partition [thl_raw,37] on broker 1002: Shrinking ISR for partition [thl_raw,37] from 1005,1006,1002 to 1002 (kafka.cluster.Partition) [2016-12-18 20:45:32,391] INFO Partition [HeartBeat,3] on broker 1002: Shrinking ISR for partition [HeartBeat,3] from 1005,1004,1002 to 1002 (kafka.cluster.Partition) [2016-12-18 21:17:59,888] INFO Rolled new log segment for '__consumer_offsets-46' in 1 ms. (kafka.log.Log) [2016-12-18 21:19:07,923] INFO Deleting segment 0 from log __consumer_offsets-46. (kafka.log.Log) [2016-12-18 21:19:07,923] INFO Deleting segment 101935860 from log __consumer_offsets-46. (kafka.log.Log) [2016-12-18 21:19:07,924] INFO Deleting index /kafka/data/__consumer_offsets-46/00000000000000000000.index.deleted (kafka.log.OffsetIndex) [2016-12-18 21:19:07,924] INFO Deleting index /kafka/data/__consumer_offsets-46/00000000000101935860.index.deleted (kafka.log.OffsetIndex) [2016-12-18 21:19:07,924] INFO Deleting index /kafka/data/__consumer_offsets-46/00000000000000000000.timeindex.deleted (kafka.log.TimeIndex) [2016-12-18 21:19:07,924] INFO Deleting index /kafka/data/__consumer_offsets-46/00000000000101935860.timeindex.deleted (kafka.log.TimeIndex) [2016-12-18 21:19:08,393] INFO Deleting segment 102963875 from log __consumer_offsets-46. (kafka.log.Log) [2016-12-18 21:19:08,410] INFO Deleting index /kafka/data/__consumer_offsets-46/00000000000102963875.index.deleted (kafka.log.OffsetIndex) [2016-12-18 21:19:08,410] INFO Deleting index /kafka/data/__consumer_offsets-46/00000000000102963875.timeindex.deleted (kafka.log.TimeIndex) [2016-12-18 21:48:53,007] INFO Rolled new log segment for 'thl_raw-24' in 1 ms. (kafka.log.Log) [2016-12-18 22:15:09,894] INFO Rolled new log segment for 'thl_raw-1' in 0 ms. (kafka.log.Log) [2016-12-18 23:34:28,526] INFO Rolled new log segment for 'thl_raw-9' in 1 ms. (kafka.log.Log) [2016-12-18 23:34:28,754] INFO Rolled new log segment for 'thl_raw-39' in 0 ms. (kafka.log.Log) [2016-12-18 23:34:28,786] INFO Rolled new log segment for 'thl_raw-7' in 0 ms. (kafka.log.Log) [2016-12-19 00:04:32,816] INFO Rolled new log segment for 'thl_raw-15' in 0 ms. (kafka.log.Log) [2016-12-19 00:04:33,049] INFO Rolled new log segment for 'thl_raw-44' in 0 ms. (kafka.log.Log) [2016-12-19 00:04:33,137] INFO Rolled new log segment for 'thl_raw-20' in 0 ms. (kafka.log.Log) [2016-12-19 00:04:33,305] INFO Rolled new log segment for 'thl_raw-40' in 0 ms. (kafka.log.Log) [2016-12-19 00:04:33,380] INFO Rolled new log segment for 'thl_raw-59' in 0 ms. (kafka.log.Log) [2016-12-19 00:04:33,470] INFO Rolled new log segment for 'thl_raw-50' in 0 ms. (kafka.log.Log) [2016-12-19 00:04:33,630] INFO Rolled new log segment for 'thl_raw-35' in 0 ms. (kafka.log.Log) [2016-12-19 00:04:33,995] INFO Rolled new log segment for 'thl_raw-45' in 0 ms. (kafka.log.Log) [2016-12-19 00:04:34,007] INFO Rolled new log segment for 'thl_raw-34' in 0 ms. (kafka.log.Log) [2016-12-19 00:04:34,265] INFO Rolled new log segment for 'thl_raw-48' in 0 ms. (kafka.log.Log) [2016-12-19 00:04:34,359] INFO Rolled new log segment for 'thl_raw-54' in 1 ms. (kafka.log.Log) [2016-12-19 00:04:34,367] INFO Rolled new log segment for 'thl_raw-10' in 0 ms. (kafka.log.Log) [2016-12-19 00:04:34,540] INFO Rolled new log segment for 'thl_raw-2' in 0 ms. (kafka.log.Log) [2016-12-19 00:04:35,123] INFO Rolled new log segment for 'thl_raw-14' in 0 ms. (kafka.log.Log) [2016-12-19 00:04:36,822] INFO Rolled new log segment for 'thl_raw-29' in 0 ms. (kafka.log.Log) [2016-12-19 00:04:36,970] INFO Rolled new log segment for 'thl_raw-18' in 0 ms. (kafka.log.Log) *C*, when that kind of error happened, we always see the replication being in problem, like: Topics Topic# Partitions# BrokersBrokers Spread %Brokers Skew %# ReplicasUnder Replicated %Producer Message/Sec __consumer_offsets <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/topics/__consumer_offsets> 50 6 100 0 3 16 0.00 ConnectorSync <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/topics/ConnectorSync> 8 6 100 16 3 25 0.00 EventInstance <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/topics/EventInstance> 8 6 100 16 3 12 0.00 fjord_healthy_checker <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/topics/fjord_healthy_checker> 8 6 100 16 3 12 0.00 HeartBeat <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/topics/HeartBeat> 8 6 100 16 3 12 0.00 HeartBit <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/topics/HeartBit> 8 6 100 0 3 25 0.00 Notification <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/topics/Notification> 8 6 100 33 3 12 0.00 NotificationEventInstance <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/topics/NotificationEventInstance> 8 6 100 16 3 12 0.00 thl_raw <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/topics/thl_raw> 64 6 100 0 3 17 0.00 *D*, All of the replication sounds related with node '1002` (click into the each of topic, all of the issued partitions having the similar like `*blue highlight*` ) Partition Information PartitionLatest OffsetLeaderReplicasIn Sync ReplicasPreferred Leader?Under Replicated? 0 1005 <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/brokers/1005> (1005,1001,1002) (1005,1002,1001) true false 1 1006 <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/brokers/1006> (1006,1002,1003) (1006,1003,1002) true false 2 1001 <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/brokers/1001> (1001,1003,1004) (1004,1003,1001) true false 3 *1002* <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/brokers/1002> *(1002,1004,1005)* *(1002)* *true* *true* 4 1003 <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/brokers/1003> (1003,1005,1006) (1003,1006,1005) true false 5 1004 <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/brokers/1004> (1004,1006,1001) (1004,1001,1006) true false 6 1005 <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/brokers/1005> (1005,1002,1003) (1003,1005,1002) true false 7 1006 <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/brokers/1006> (1006,1003,1004) (1003,1006,1004) true false