Re: [DISCUSS] KIP-380: Detect outdated control requests and bounced brokers using broker generation

Patrick Huang Wed, 10 Oct 2018 12:52:21 -0700

Hey Stanislav,

Sure. Thanks for your interest in this KIP. I am glad to provide more detail.

broker A is initiating a controlled shutdown (restart). The Controller
sends a StopReplicaRequest but it reaches broker A after it has started up
again. He therefore stops replicating those partitions even though he
should just be starting to
This is right.

Controller sends a LeaderAndIsrRequest before broker A initiates a restart.
Broker A restarts and receives the LeaderAndIsrRequest then. It therefore
starts leading for the partitions sent by that request and might stop
leading partitions that it was leading previously.
This was well explained in the linked JIRA, but I cannot understand why
that would happen due to my limited experience. If Broker A leads p1 and
p2, when would a Controller send a LeaderAndIsrRequest with p1 only and not
want Broker A to drop leadership for p2?
The root cause of the issue is that after a broker just restarts, it relies on 
the first LeaderAndIsrRequest to populate the partition state and initializes 
the highwater mark checkpoint thread. The highwater mark checkpoint thread will 
overwrite the highwater mark checkpoint file based on the broker's in-memory 
partition states. In other words, If a partition that is physically hosted by 
the broker is missing in the in-memory partition states map, its highwater mark 
will be lost after the highwater mark checkpoint thread overwrites the file. 
(Related codes: 
https://github.com/apache/kafka/blob/ed3bd79633ae227ad995dafc3d9f384a5534d4e9/core/src/main/scala/kafka/server/ReplicaManager.scala#L1091)
[https://avatars3.githubusercontent.com/u/47359?s=400&v=4]<https://github.com/apache/kafka/blob/ed3bd79633ae227ad995dafc3d9f384a5534d4e9/core/src/main/scala/kafka/server/ReplicaManager.scala#L1091>

apache/kafka<https://github.com/apache/kafka/blob/ed3bd79633ae227ad995dafc3d9f384a5534d4e9/core/src/main/scala/kafka/server/ReplicaManager.scala#L1091>
Mirror of Apache Kafka. Contribute to apache/kafka development by creating an 
account on GitHub.
github.com

In your example, assume the first LeaderAndIsrRequest broker A receives is the 
one initiated in the controlled shutdown logic in Controller to move leadership 
away from broker A. This LeaderAndIsrRequest only contains partitions that 
broker A leads, not all the partitions that broker A hosts (i.e. no follower 
partitions), so the highwater mark for the follower partitions will be lost. 
Also, the first LeaderAndIsrRequst broker A receives may not necessarily be the 
one initiated in controlled shutdown logic (e.g. there can be an ongoing 
preferred leader election), although I think this may not be very common.

Here the controller will start processing the BrokerChange event (that says
that broker A shutdown) after the broker has come back up and re-registered
himself in ZK?
How will the Controller miss the restart, won't he subsequently receive
another ZK event saying that broker A has come back up?
Controller will not miss the BrokerChange event and actually there will be two 
BrokerChange events fired in this case (one for broker deregistration in zk and 
one for registration). However, when processing the BrokerChangeEvent, 
controller needs to do a read from zookeeper to get back the current brokers in 
the cluster and if the bounced broker already joined the cluster by this time, 
controller will not know this broker has been bounced because it sees no diff 
between zk and its in-memory cache. So basically both of the BrokerChange event 
processing become no-op.

Hope that I answer your questions. Feel free to follow up if I am missing 
something.

Thanks,
Zhanxiang (Patrick) Huang

________________________________
From: Stanislav Kozlovski <stanis...@confluent.io>
Sent: Wednesday, October 10, 2018 7:22
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-380: Detect outdated control requests and bounced 
brokers using broker generation

Hi Patrick,

Thanks for the KIP! Fixing such correctness issues is always very welcome -
they're commonly hard to diagnose and debug when they happen in production.

I was wondering if I understood the potential correctness issues correctly.
Here is what I got:

   - If a broker bounces during controlled shutdown, the bounced broker may
   accidentally process its earlier generation’s StopReplicaRequest sent from
   the active controller for one of its follower replicas, leaving the replica
   offline while its remaining replicas may stay online

broker A is initiating a controlled shutdown (restart). The Controller
sends a StopReplicaRequest but it reaches broker A after it has started up
again. He therefore stops replicating those partitions even though he
should just be starting to

   - If the first LeaderAndIsrRequest that a broker processes is sent by
   the active controller before its startup, the broker will overwrite the
   high watermark checkpoint file and may cause incorrect truncation (
   KAFKA-7235 <https://issues.apache.org/jira/browse/KAFKA-7235>)

Controller sends a LeaderAndIsrRequest before broker A initiates a restart.
Broker A restarts and receives the LeaderAndIsrRequest then. It therefore
starts leading for the partitions sent by that request and might stop
leading partitions that it was leading previously.
This was well explained in the linked JIRA, but I cannot understand why
that would happen due to my limited experience. If Broker A leads p1 and
p2, when would a Controller send a LeaderAndIsrRequest with p1 only and not
want Broker A to drop leadership for p2?

   - If a broker bounces very quickly, the controller may start processing
   the BrokerChange event after the broker already re-registers itself in zk.
   In this case, controller will miss the broker restart and will not send any
   requests to the broker for initialization. The broker will not be able to
   accept traffics.

Here the controller will start processing the BrokerChange event (that says
that broker A shutdown) after the broker has come back up and re-registered
himself in ZK?
How will the Controller miss the restart, won't he subsequently receive
another ZK event saying that broker A has come back up?

Could we explain these potential problems in a bit more detail just so they
could be more easily digestable by novices?

Thanks,
Stanislav

On Wed, Oct 10, 2018 at 9:21 AM Dong Lin <lindon...@gmail.com> wrote:

> Hey Patrick,
>
> Thanks much for the KIP. The KIP is very well written.
>
> LGTM.  +1 (binding)
>
> Thanks,
> Dong
>
>
> On Tue, Oct 9, 2018 at 11:46 PM Patrick Huang <hzx...@hotmail.com> wrote:
>
> > Hi All,
> >
> > Please find the below KIP which proposes the concept of broker generation
> > to resolve issues caused by controller missing broker state changes and
> > broker processing outdated control requests.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-380%3A+Detect+outdated+control+requests+and+bounced+brokers+using+broker+generation
> >
> > All comments are appreciated.
> >
> > Best,
> > Zhanxiang (Patrick) Huang
> >
>

--
Best,
Stanislav

Re: [DISCUSS] KIP-380: Detect outdated control requests and bounced brokers using broker generation

Reply via email to