[ https://issues.apache.org/jira/browse/KAFKA-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110748#comment-15110748 ]
Flavio Junqueira commented on KAFKA-3083: ----------------------------------------- Sure, we need to transform all operations to look like what we currently have in ZKCheckedEphemeral. That particular class is a bit special because it performs checks and such, but essentially we need to change the current calls in ZkUtils to use asynchronous calls using the ZK handle directly and have a callback class that pairs up with the call. Related to this present issue, we will also need to implement session management, but this time it can't try to be transparent like ZkClient does. It is good to have a central point to get the current zk handle from, but we need to give the broker the ability to signal when to create a new session. As part of this signaling, we will need to implement some kind of listener to propagate events. Another option is to let the broker implement directly a Watcher to process event notifications. One simple way to start is to replace gradually the calls in ZkUtils with asynchronous calls, still using the handle ZkUtils provide. The calls would block to maintain the current behavior outside ZkUtils. Once that's done, we can make the calls non-blocking and do the necessary changes across broker/controller. Finally, we can replace the session management with our own last. If you guys want to do this, then we should probably create an umbrella jira. > a soft failure in controller may leave a topic partition in an inconsistent > state > --------------------------------------------------------------------------------- > > Key: KAFKA-3083 > URL: https://issues.apache.org/jira/browse/KAFKA-3083 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.9.0.0 > Reporter: Jun Rao > Assignee: Mayuresh Gharat > > The following sequence can happen. > 1. Broker A is the controller and is in the middle of processing a broker > change event. As part of this process, let's say it's about to shrink the isr > of a partition. > 2. Then broker A's session expires and broker B takes over as the new > controller. Broker B sends the initial leaderAndIsr request to all brokers. > 3. Broker A continues by shrinking the isr of the partition in ZK and sends > the new leaderAndIsr request to the broker (say C) that leads the partition. > Broker C will reject this leaderAndIsr since the request comes from a > controller with an older epoch. Now we could be in a situation that Broker C > thinks the isr has all replicas, but the isr stored in ZK is different. -- This message was sent by Atlassian JIRA (v6.3.4#6332)