[jira] [Commented] (KAFKA-5611) One or more consumers in a consumer-group stop consuming after rebalancing

Jason Gustafson (JIRA) Tue, 25 Jul 2017 17:22:22 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100986#comment-16100986
 ]


Jason Gustafson commented on KAFKA-5611:
----------------------------------------

bq. Is the above a valid scenario? Does that mean that we have a healthy 
consumer that has been assigned no partitions and it will never get any until 
there is a reason to rebalance?

The heartbeat thread begins after the join completes, but a wakeup could 
prevent the assignment from finishing. From the perspective of the coordinator 
logic, there is nothing left to be done, and that will remain until the next 
rebalance.

bq. Secondly, we were looking at the same code and were thinking (since there 
is no stacktrace at the mo) that wakeup call was against the the 
client.poll(future) line in the same code where you applied the patch. Couldn't 
that be the case? What would happen then?

This case was previously handled. The future will not be reset and the next 
call to poll() will resume the wait for it. What we did not account for was the 
metadata fetch which occurs in the onJoinComplete callback. We can discuss this 
on the PR if you like.

> One or more consumers in a consumer-group stop consuming after rebalancing
> --------------------------------------------------------------------------
>
>                 Key: KAFKA-5611
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5611
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.10.2.0
>            Reporter: Panos Skianis
>            Assignee: Jason Gustafson
>              Labels: reliability
>             Fix For: 0.11.0.1
>
>         Attachments: bad-server-with-more-logging-1.tar.gz, kka02, Server 1, 
> Server 2, Server 3
>
>
> Scenario: 
>   - 3 zookeepers, 4 Kafkas. 0.10.2.0, with 0.9.0 compatibility still on 
> (other apps need it but the one mentioned below is already on kafka 0.10.2.0  
> client).
>   - 3 servers running 1 consumer each under the same consumer groupId. 
>   - Servers seem to be consuming messages happily but then there is a timeout 
> to an external service that causes our app to restart the Kafka Consumer on 
> one of the servers (this is by design). That causes rebalancing of the group 
> and upon restart of one of the Consumers seem to "block".
>   - Server 3 is where the problems occur.
>   - Problem fixes itself either by restarting one of the 3 servers or cause 
> the group to rebalance again by using the console consumer with the 
> autocommit set to false and using the same group.
>  
> Note: 
>  - Haven't managed to recreate it at will yet.
>  - Mainly happens in production environment, often enough. Hence I do not 
> have any logs with DEBUG/TRACE statements yet.
>  - Extracts from log of each app server are attached. Also the log of the 
> kafka that seems to be dealing with the related group and generations.
>  - See COMMENT lines in the files for further info.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-5611) One or more consumers in a consumer-group stop consuming after rebalancing

Reply via email to