Hello Kafka Dev,
 We need help on lagging issue we are seeing on one of the environment which 
doesn’t have much load.  We are running kafka on multiple environement, and on 
one of our environemnt we do see events are taking huge time (some time more 
then a day) to get process from kafka. The topic have two partition, 3 
replicase and two consumers are running on it (So one to one mapping between 
partition and consumer). When i run kafka-consumer-group.sh to find the stats, 
i can see lag on one of the consumer and then lag move to another consumer 
after some time, and they keep switching with time and increase time to process 
events. So look to me rebalancing is happening but at the same time consumer-id 
is same so consumer not getting started in between. We also tried to restart 
and kafka and zookeeper but end result is same, here is the detail.


[2018-10-12 03:52:21,676] WARN Removing server circle2-kafka2:909 from 
bootstrap.servers as DNS resolution failed for circle2-kafka2 
(org.apache.kafka.clients.ClientUtils)
group-es
group-rds

[vikas@circle1-kafka1 kafka]$ ./bin/kafka-consumer-groups.sh --bootstrap-server 
circle1-kafka1:9092,circle2-kafka2:9092, circle1-kafka3 -describe -group 
group-rds
Note: This will not show information about old Zookeeper-based consumers.
[2018-10-12 03:53:06,226] WARN Removing server circle2-kafka2:9092 from 
bootstrap.servers as DNS resolution failed for circle2-kafka2 
(org.apache.kafka.clients.ClientUtils)
[2018-10-12 03:53:06,436] WARN Removing server circle2-kafka2:9092 from 
bootstrap.servers as DNS resolution failed for circle2-kafka2 
(org.apache.kafka.clients.ClientUtils)

TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             
CONSUMER-ID                                                                     
              HOST            CLIENT-ID
topic.events    1          45471           45471           0               
data-consumer-i-00404a50d7551ef37-circle1-ecs2-group-rds-dc1cb0e1-48fb-40c5-bd96-0e9980e1083d
 /172.27.4.133   data-consumer-i-00404a50d7551ef37-circle1-ecs2-group-rds
topic.events    0          344987          346323          1336            
data-consumer-i-00404a50d7551ef37-circle1-ecs2-group-rds-3a13af04-048f-40b4-9b09-b74a9600dfd8
 /172.27.4.133   data-consumer-i-00404a50d7551ef37-circle1-ecs2-group-rds



[vikas@circle1-kafka1 kafka]$ ./bin/kafka-consumer-groups.sh --bootstrap-server 
circle1-kafka1:9092,circle2-kafka2:9092,circle1-kafka3 -describe -group 
group-rds
Note: This will not show information about old Zookeeper-based consumers.
[2018-10-12 04:04:29,725] WARN Removing server circle2-kafka2:9092 from 
bootstrap.servers as DNS resolution failed for circle2-kafka2 
(org.apache.kafka.clients.ClientUtils)
[2018-10-12 04:04:29,926] WARN Removing server circle2-kafka2:9092 from 
bootstrap.servers as DNS resolution failed for circle2-kafka2 
(org.apache.kafka.clients.ClientUtils)

TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             
CONSUMER-ID                                                                     
              HOST            CLIENT-ID
topic.events    1          44873           45471           598             
data-consumer-i-00404a50d7551ef37-circle1-ecs2-group-rds-dc1cb0e1-48fb-40c5-bd96-0e9980e1083d
 /172.27.4.133   data-consumer-i-00404a50d7551ef37-circle1-ecs2-group-rds
topic.events    0          346324          346324          0               
data-consumer-i-00404a50d7551ef37-circle1-ecs2-group-rds-3a13af04-048f-40b4-9b09-b74a9600dfd8
 /172.27.4.133   data-consumer-i-00404a50d7551ef37-circle1-ecs2-group-rds



Here is the info of kafka env
1)Version -> kafka_2.11-1.1.0

2)Zookeeper setting -> Default

3)kafka setting -> Most of the settings are default, here are few specific 
changes we have done
zookeeper.connection.timeout.ms=6000
#Setting the replication for nodes under the default of 3
default.replication.factor=3
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
config.storage.replication.factor=3
offset.storage.replication.factor=3
status.storage.replication.factor=3
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
log.retention.hours=24

Please do let me know in case you need more detail from my end. 

Your quick help is much appreciated, in case you are not able to help or i am 
at wrong group then please point me at right group. 

Regards,
Vikas

Reply via email to