[jira] [Comment Edited] (KAFKA-3083) a soft failure in controller may leader a topic partition in an inconsistent state

Flavio Junqueira (JIRA) Wed, 13 Jan 2016 02:05:34 -0800

    [ 
https://issues.apache.org/jira/browse/KAFKA-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095920#comment-15095920
 ]


Flavio Junqueira edited comment on KAFKA-3083 at 1/13/16 10:04 AM:
-------------------------------------------------------------------

[~mgharat] the fact that A kept going with a session expired makes me think 
that A ignored the connection loss event and kept doing controller work. What 
we recommend for mastership with ZooKeeper is that the master stops doing 
master work upon receiving a connection loss event, and either resumes if it 
reconnects or drops mastership altogether if the session expires. Talking to 
[~junrao] about this, it sounds like the controller isn't processing the event 
that ZkClient is passing up.

Let me give you some more context on session semantics. At 2/3 of the session 
expiration, if the client hasn't heard from the current server it is connected 
to, then it will start looking for another server and will notify the 
application via connection loss events. At that point, the recommendation is 
that the client (broker in this case) stops doing any master work until it 
learns more about the session.

I also need to add that I haven't verified this in the code, so it is possible 
that it is something else causing the problem, but it sounds wrong that a 
controller with a session expired keeps going.


was (Author: fpj):
@mayuresh the fact that A kept going with a session expired makes me think that 
A ignored the connection loss event and kept doing controller work. What we 
recommend for mastership with ZooKeeper is that the master stops doing master 
work upon receiving a connection loss event, and either resumes if it 
reconnects or drops mastership altogether if the session expires. Talking to 
[~junrao] about this, it sounds like the controller isn't processing the event 
that ZkClient is passing up.

Let me give you some more context on session semantics. At 2/3 of the session 
expiration, if the client hasn't heard from the current server it is connected 
to, then it will start looking for another server and will notify the 
application via connection loss events. At that point, the recommendation is 
that the client (broker in this case) stops doing any master work until it 
learns more about the session.

I also need to add that I haven't verified this in the code, so it is possible 
that it is something else causing the problem, but it sounds wrong that a 
controller with a session expired keeps going.

> a soft failure in controller may leader a topic partition in an inconsistent 
> state
> ----------------------------------------------------------------------------------
>
>                 Key: KAFKA-3083
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3083
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.9.0.0
>            Reporter: Jun Rao
>            Assignee: Mayuresh Gharat
>
> The following sequence can happen.
> 1. Broker A is the controller and is in the middle of processing a broker 
> change event. As part of this process, let's say it's about to shrink the isr 
> of a partition.
> 2. Then broker A's session expires and broker B takes over as the new 
> controller. Broker B sends the initial leaderAndIsr request to all brokers.
> 3. Broker A continues by shrinking the isr of the partition in ZK and sends 
> the new leaderAndIsr request to the broker (say C) that leads the partition. 
> Broker C will reject this leaderAndIsr since the request comes from a 
> controller with an older epoch. Now we could be in a situation that Broker C 
> thinks the isr has all replicas, but the isr stored in ZK is different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (KAFKA-3083) a soft failure in controller may leader a topic partition in an inconsistent state

Reply via email to