[ 
https://issues.apache.org/jira/browse/KAFKA-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15833869#comment-15833869
 ] 

Vishal Shukla commented on KAFKA-4676:
--------------------------------------


Hi Jason,

Thank you very much for immediate actions on this.

Consumer logs on consumer-node-01 & consumer-node-02 when topic gets stuck are 
attached in [^stuck-consumer-node-1.log] and [^consumer-node-2.log] 
respectively. This is around 2017-01-21 03:45 CET.

Config as of this time: 

{code}
session.timeout.ms=15000
max.poll.interval.ms=300000
max.poll.records=500
request.timeout.ms=3050000
{code}

Then restarting consumer-node-02 service triggered rebalancing appropriately 
and the messages were consumed fine. Also attached the logs when triggering 
restart for both nodes as [^restart-node2-consumer-node-2.log] & 
[^restart-node2-consumer-node-1.log].

This stayed normal for few hours till around 2017-01-21 13:21 CET. This time 
the case seemed to be little different than previous case. There were no kafka 
logs in consumer-node-2. However, consumer-node-1 constantly had logs about 
rejoining, assigning partitions and warning about config as shown in 
[^stuck-case2.log].

After this case, we changed {{session.timeout.ms}} to {{300000}} and 
{{max.poll.records}} to {{100}}. This gets rid of the warning and we still 
occasionally observe the rejoining & assignment logs in consumers.

{code}
2017-01-22 03:30:05,919 [INFO] 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - Revoking 
previously assigned partitions [] for group event-saved-group
2017-01-22 03:30:05,919 [INFO] 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator - (Re-)joining 
group event-saved-group
2017-01-22 03:30:06,692 [INFO] 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - Revoking 
previously assigned partitions [event-saved-prod-2-8] for group 
event-saved-group
2017-01-22 03:30:06,692 [INFO] 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator - (Re-)joining 
group event-saved-group
2017-01-22 03:30:06,720 [INFO] 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - Revoking 
previously assigned partitions [event-saved-prod-2-2] for group 
event-saved-group
2017-01-22 03:30:06,720 [INFO] 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator - (Re-)joining 
group event-saved-group
2017-01-22 03:30:06,720 [INFO] 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - Revoking 
previously assigned partitions [event-saved-prod-2-7] for group 
event-saved-group
2017-01-22 03:30:06,720 [INFO] 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator - (Re-)joining 
group event-saved-group
2017-01-22 03:30:07,956 [INFO] 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Successfully 
joined group event-saved-group with generation 81
2017-01-22 03:30:07,956 [INFO] 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - Setting newly 
assigned partitions [event-saved-prod-2-5] for group event-saved-group
2017-01-22 03:30:07,957 [INFO] 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Successfully 
joined group event-saved-group with generation 81
2017-01-22 03:30:07,957 [INFO] 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - Setting newly 
assigned partitions [event-saved-prod-2-2] for group event-saved-group
2017-01-22 03:30:07,960 [INFO] 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Successfully 
joined group event-saved-group with generation 81
2017-01-22 03:30:07,960 [INFO] 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Successfully 
joined group event-saved-group with generation 81
2017-01-22 03:30:07,960 [INFO] 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - Setting newly 
assigned partitions [] for group event-saved-group
2017-01-22 03:30:07,960 [INFO] 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - Setting newly 
assigned partitions [event-saved-prod-2-6] for group event-saved-group
2017-01-22 03:30:10,958 [INFO] 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - Revoking 
previously assigned partitions [event-saved-prod-2-5] for group 
event-saved-group
2017-01-22 03:30:10,958 [INFO] 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator - (Re-)joining 
group event-saved-group
2017-01-22 03:30:10,971 [INFO] 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - Revoking 
previously assigned partitions [] for group event-saved-group
2017-01-22 03:30:10,971 [INFO] 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator - (Re-)joining 
group event-saved-group
{code}

Do you see these logs as unusual? Also notice that we are running the app 
against single machine ZK & Kafka (both being in the same VM). We are aware 
that it isn't recommended and we should have ZK separately in quorum. However, 
I assume that may not have anything to do with the current issue. Please let me 
know if that is incorrect or you need more info related to that.

Regarding which partitions are getting stuck, it is some times related to the 
partitions in 1 consumer and at many times all partitions also get stuck. I 
need to wait for the issue to recur in order to give you exact partition along 
with respective logs. Will post it once I observe the issue again.

I hope this info helps. Please let us know if any other information can help.


> Kafka consumers gets stuck for some partitions
> ----------------------------------------------
>
>                 Key: KAFKA-4676
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4676
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.10.1.0
>            Reporter: Vishal Shukla
>            Priority: Critical
>              Labels: consumer, reliability
>         Attachments: restart-node2-consumer-node-1.log, 
> restart-node2-consumer-node-2.log, stuck-case2.log, 
> stuck-consumer-node-1.log, stuck-consumer-node-2.log, 
> stuck-topic-thread-dump.log
>
>
> We recently upgraded to Kafka 0.10.1.0. We are frequently facing issue that 
> Kafka consumers get stuck suddenly for some partitions.
> Attached thread dump.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to