Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

Colin McCabe Tue, 15 Dec 2020 08:51:46 -0800

On Tue, Dec 15, 2020, at 04:13, Tom Bentley wrote:
> Hi Colin,
> 
> The KIP says that "brokers which are fenced will not appear in
> MetadataResponses.  So clients that have up-to-date metadata will not try
> to contact fenced brokers.", which is fine, but it doesn't seem to
> elaborate on what happens for clients which try to contact a broker (using
> stale metadata, for example) and find it in the registered-but-fenced
> state.


Hi Tom,

I have dropped the broker-side fencing from this proposal.  So now the fencing 
is basically the same as today's fencing: it's controller-side only.  That 
means that clients using stale metadata can contact fenced brokers and 
communicate with them.  The only case where the broker will be inaccessible is 
when it hasn't finished starting yet (i.e., has not yet attained RUNNING state.)

Just like today, we expect the safeguards built into the replication protocol 
to prevent the worst corner cases that could result from this.  I do think we 
will probably take up this issue later and find a better solution for 
client-side metadata, but this KIP is big enough as-is.

best,
Colin


> You said previously that the broker will respond with a retriable
> error, which is again fine, but you didn't answer my question about which
> error code you've chosen for this. I realise that this doesn't really
> affect the correctness of the behaviour, but I'm not aware of any existing
> error code which is a good fit. So it would be good to understand about how
> you're making this backward compatible for clients.
> 
> Many thanks,
> 
> Tom
> 
> On Tue, Dec 15, 2020 at 1:42 AM Colin McCabe <[email protected]> wrote:
> 
> > On Fri, Dec 11, 2020, at 17:07, Jun Rao wrote:
> > > Hi, Colin,
> > >
> > > Thanks for the reply. Just a couple of more comments below.
> > >
> > > 210. Since we are deprecating zookeeper.connection.timeout.ms, should we
> > > add a new config to bound the time for a broker to connect to the
> > > controller during starting up?
> > >
> >
> > Good idea.  I added initial.broker.registration.timeout.ms for this.
> >
> > > 211. BrokerHeartbeat no longer has the state field in the
> > request/response.
> > > However, (a) the controller shutdown section still has "In its periodic
> > > heartbeats, the broker asks the controller if it can transition into the
> > > SHUTDOWN state.  This motivates the controller to move all of the leaders
> > > off of that broker.  Once they are all moved, the controller responds to
> > > the heartbeat with a nextState of SHUTDOWN."; (2) the description of
> > > BrokerHeartbeat still references currentState and targetState.
> > >
> >
> > Thanks.  I've made these sections clearer and removed the obsolete
> > references to sending states.
> >
> > best,
> > Colin
> >
> > > Jun
> > >
> > > On Fri, Dec 11, 2020 at 1:33 PM Colin McCabe <[email protected]> wrote:
> > >
> > > > On Wed, Dec 9, 2020, at 10:10, Jun Rao wrote:
> > > > > Hi, Colin,
> > > > >
> > > > > Thanks for the update. A few more follow up comments.
> > > > >
> > > >
> > > > Hi Jun,
> > > >
> > > > Thanks again for the review.
> > > >
> > > > > 100. FailedReplicaRecord: Since this is reported by each broker
> > > > > independently, perhaps we could use a more concise representation
> > that
> > > > has
> > > > > a top level broker field, an array of topics, which has an array of
> > > > > partitions.
> > > > >
> > > >
> > > > The issue is that there is a size limit on the each record.  Putting
> > all
> > > > of the partitions of a log directory into a single record would
> > probably
> > > > break that in many cases.  Still, we can optimize a bit by having an
> > array
> > > > of partition IDs, since nearly all the time, we have more than one
> > from the
> > > > same topic.
> > > >
> > > > > 200. Sounds good. If we remove the broker-side fencing logic, do we
> > plan
> > > > to
> > > > > still keep FENCED in broker state? Do we plan to expose the new
> > states
> > > > > through the existing BrokerState metric and if so, what are the
> > values
> > > > for
> > > > > the new states?
> > > > >
> > > >
> > > > No, we don't need FENCED any more.  I have removed it from the KIP.
> > > >
> > > > The new states are very similar to the current ones, actually.  There
> > are
> > > > no new states or removed ones.  The main change in the broker state
> > machine
> > > > is that the RECOVERING_FROM_UNCLEAN_SHUTDOWN state has been renamed to
> > > > RECOVERY.  Also, unlike previously, the broker will always pass through
> > > > RECOVERY (although it may only stay in this state for a very short
> > amount
> > > > of time).
> > > >
> > > > > 201. This may be fine too. Could we document what happens when the
> > > > > broker.id/controller.id in metadata.properties don't match the
> > broker
> > > > > config when the broker starts up?
> > > > >
> > > >
> > > > I added some documentation about this.
> > > >
> > > > > 204. There is still "The highest metadata offset which the broker
> > has not
> > > > > reached" referenced under BrokerRegistration.
> > > > >
> > > >
> > > > It should be CurrentMetadataOffset.  Fixed.
> > > >
> > > > > 206. Is that separate step needed given KIP-516? With KIP-516 (
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers#KIP516:TopicIdentifiers-LeaderAndIsr
> > > > ),
> > > > > we don't need to wait for the topic data to be removed from all
> > brokers
> > > > > before removing the topic metadata. The combination of unmatching
> > > > > topicId
> > > > > or the missing topicId from the metadata is enough for the broker to
> > > > > clean
> > > > > up deleted topics asynchronously.
> > > >
> > > > It won't be needed once KIP-516 is adopted, but this hasn't been
> > > > implemented yet.
> > > >
> > > > best,
> > > > Colin
> > > >
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Dec 8, 2020 at 5:27 PM Colin McCabe <[email protected]>
> > wrote:
> > > > >
> > > > > > On Thu, Dec 3, 2020, at 16:37, Jun Rao wrote:
> > > > > > > Hi, Colin,
> > > > > > >
> > > > > > > Thanks for the updated KIP. A few more comments below.
> > > > > > >
> > > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > Thanks again for the reviews.
> > > > > >
> > > > > > > 80.2 For deprecated configs, we need to include zookeeper.* and
> > > > > > > broker.id.generation.enable.
> > > > > > >
> > > > > >
> > > > > > Added.
> > > > > >
> > > > > > > 83.1 If a broker is down, does the controller keep the previously
> > > > > > > registered broker epoch forever? If not, how long does the
> > controller
> > > > > > keep
> > > > > > > it? What does the controller do when receiving a broker heartbeat
> > > > request
> > > > > > > with an unfound broker epoch?
> > > > > > >
> > > > > >
> > > > > > Yes, the controller keeps the previous registration forever.
> > > > > >
> > > > > > Broker heartbeat requests with an incorrect broker epoch will be
> > > > rejected
> > > > > > with STALE_BROKER_EPOCH.
> > > > > >
> > > > > > > 100. Have you figured out if we need to add a new record type for
> > > > > > reporting
> > > > > > > partitions on failed disks?
> > > > > > >
> > > > > >
> > > > > > I added FailedReplicaRecord to reflect the case where a JBOD
> > directory
> > > > has
> > > > > > failed, leading to failed replicas.
> > > > > >
> > > > > > > 102. For debugging purposes, sometimes it's useful to read the
> > > > metadata
> > > > > > > topic using tools like console-consumer. Should we support that
> > and
> > > > if
> > > > > > so,
> > > > > > > how?
> > > > > > >
> > > > > >
> > > > > > For now, we have the ability to read the metadata logs with the
> > > > dump-logs
> > > > > > tool.  I think we will come up with some other tools in the future
> > as
> > > > we
> > > > > > get experience.
> > > > > >
> > > > > > > 200. "brokers which are fenced will not appear in
> > MetadataResponses.
> > > > The
> > > > > > > broker will not respond to these requests-- instead, it will
> > simply
> > > > > > > disconnect." If the controller is partitioned off from the
> > brokers,
> > > > this
> > > > > > > design will cause every broker to stop accepting new client
> > > > requests. In
> > > > > > > contrast, if ZK is partitioned off, the existing behavior is
> > that the
> > > > > > > brokers can continue to work based on the last known metadata.
> > So, I
> > > > am
> > > > > > not
> > > > > > > sure if we should change the existing behavior because of the
> > bigger
> > > > > > impact
> > > > > > > in the new one. Another option is to keep the existing behavior
> > and
> > > > > > expose
> > > > > > > a metric for fenced brokers so that the operator could be
> > alerted.
> > > > > > >
> > > > > >
> > > > > > I'm skeptical about how well running without ZK currently works.
> > > > However,
> > > > > > I will move the broker-side fencing into a follow-up KIP.  This
> > KIP is
> > > > > > already pretty large and there is no hard dependency on this.
> > There
> > > > may
> > > > > > also be other ways of accomplishing the positive effects of what
> > > > > > broker-side fencing, so more discussion is needed.
> > > > > >
> > > > > > > 201. I read Ron's comment, but I am still not sure the benefit of
> > > > keeping
> > > > > > > broker.id and controller.id in meta.properties. It seems that
> > we are
> > > > > > just
> > > > > > > duplicating the same info in two places and have the additional
> > > > burden of
> > > > > > > making sure the values in the two places are consistent.
> > > > > > >
> > > > > >
> > > > > > I think the reasoning is that having broker.id protects us against
> > > > > > accidentally bringing up a broker with a disk from a different
> > > > broker.  I
> > > > > > don't feel strongly about this but it seemed simpler to keep it.
> > > > > >
> > > > > > > 202. controller.connect.security.protocol: Is this needed since
> > > > > > > controller.listener.names and listener.security.protocol.map
> > imply
> > > > the
> > > > > > > security protocol already?
> > > > > > >
> > > > > >
> > > > > > You're right, this isn't needed.  I'll remove it.
> > > > > >
> > > > > > > 203. registration.heartbeat.interval.ms: It defaults to 2k. ZK
> > uses
> > > > 1/3
> > > > > > of
> > > > > > > the session timeout for heartbeat. So, given the default 18k for
> > > > > > > registration.lease.timeout.ms, should we default
> > > > > > > registration.heartbeat.interval.ms to 6k?
> > > > > > >
> > > > > >
> > > > > > 6 seconds seems like a pretty long time between heartbeats.  It
> > might
> > > > be
> > > > > > useful to know when a broker is missing heartbeats, with less time
> > than
> > > > > > that.  I provisionally set it to 3 seconds (we can always change
> > > > later...)
> > > > > >
> > > > > > I also changed the name of these configurations to "
> > > > > > broker.heartbeat.interval.ms" and "broker.registration.timeout.ms"
> > to
> > > > try
> > > > > > to clarify them a bit.
> > > > > >
> > > > > > > 204. "The highest metadata offset which the broker has not
> > reached."
> > > > It
> > > > > > > seems this should be "has reached".
> > > > > > >
> > > > > >
> > > > > > I changed this to "one more than the highest metadata offset which
> > the
> > > > > > broker has reached."
> > > > > >
> > > > > > > 205. UnfenceBrokerRecord and UnregisterBrokerRecord: To me, they
> > > > seem to
> > > > > > be
> > > > > > > the same. Do we need both?
> > > > > > >
> > > > > >
> > > > > > Unregistration means that the broker has been removed from the
> > cluster.
> > > > > > That is different than unfencing, which marks the broker as active.
> > > > > >
> > > > > > > 206. TopicRecord: The Deleting field is used to indicate that the
> > > > topic
> > > > > > is
> > > > > > > being deleted. I am wondering if this is really needed since
> > > > RemoveTopic
> > > > > > > already indicates the same thing.
> > > > > > >
> > > > > >
> > > > > > RemoveTopic is the last step, that scrubs all metadata about the
> > topic.
> > > > > > In order to get to that last step, the topic data needs to removed
> > > > from all
> > > > > > brokers (after each broker notices that the topic is being
> > deleted).
> > > > > >
> > > > > > best,
> > > > > > Colin
> > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Wed, Dec 2, 2020 at 2:50 PM Colin McCabe <[email protected]>
> > > > wrote:
> > > > > > >
> > > > > > > > On Wed, Dec 2, 2020, at 14:07, Ron Dagostino wrote:
> > > > > > > > > Hi Colin.  Thanks for the updates.  It's now clear to me that
> > > > brokers
> > > > > > > > > keep their broker epoch for the life of their JVM -- they
> > > > register
> > > > > > > > > once, get their broker epoch in the response, and then never
> > > > > > > > > re-register again.  Brokers may get fenced, but they keep the
> > > > same
> > > > > > > > > broker epoch for the life of their JVM.  The incarnation ID
> > is
> > > > also
> > > > > > > > > kept for the life of the JVM but is generated by the broker
> > > > itself
> > > > > > > > > upon startup, and the combination of the two allows the
> > > > Controller to
> > > > > > > > > act idempotently if any previously-sent registration response
> > > > gets
> > > > > > > > > lost.  Makes sense.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Thanks, Ron.  That's a good summary.
> > > > > > > >
> > > > > > > > > One thing I wonder about is if it might be helpful for the
> > > > broker to
> > > > > > > > > send the Cluster ID as determined from its meta.properties
> > file
> > > > in
> > > > > > its
> > > > > > > > > registration request.  Does it even make sense for the
> > broker to
> > > > > > > > > successfully register and enter the Fenced state if it has
> > the
> > > > wrong
> > > > > > > > > Cluster ID?
> > > > > > > >
> > > > > > > > Yeah, that's a good idea.  Let's have the broker pass its
> > cluster
> > > > ID in
> > > > > > > > the registration RPC, and then registration can fail if the
> > broker
> > > > is
> > > > > > > > configured for the wrong cluster.
> > > > > > > >
> > > > > > > > >  The nextMetadatOffset value that the broker communicates
> > > > > > > > > in its registration request only has meaning within the
> > correct
> > > > > > > > > cluster, so it feels to me that the Controller should have
> > some
> > > > way
> > > > > > to
> > > > > > > > > perform this sanity check.  There is currently (pre-KIP 500)
> > a
> > > > check
> > > > > > > > > in the broker to make sure its configured cluster ID matches
> > the
> > > > one
> > > > > > > > > stored in ZooKeeper, and we will have to perform this
> > validation
> > > > > > > > > somewhere in the KIP-500 world.  If the Controller doesn't
> > do it
> > > > > > > > > within the registration request then the broker will have to
> > > > make a
> > > > > > > > > metadata request to the Controller, retrieve the Cluster ID,
> > and
> > > > > > > > > perform the check itself.  It feels to me that it might be
> > > > better for
> > > > > > > > > the Controller to just do it, and then the broker doesn't
> > have to
> > > > > > > > > worry about it anymore once it successfully registers.
> > > > > > > > >
> > > > > > > > > I also have a question about the broker.id value and
> > > > > > meta.properties.
> > > > > > > > > The KIP now says "In version 0 of meta.properties, there is a
> > > > > > > > > broker.id field.  Version 1 does not have this field.  It
> > is no
> > > > > > longer
> > > > > > > > > needed because we no longer support dynamic broker id
> > > > assignment."
> > > > > > > > > But then there is an example version 1 meta.properties file
> > that
> > > > > > shows
> > > > > > > > > the broker.id value.  I actually wonder if maybe the
> > broker.id
> > > > value
> > > > > > > > > would be good to keep in the version 1 meta.properties file
> > > > because
> > > > > > it
> > > > > > > > > currently (pre-KIP 500, version 0) acts as a sanity check to
> > make
> > > > > > sure
> > > > > > > > > the broker is using the correct log directory.  Similarly
> > with
> > > > the
> > > > > > > > > controller.id value on controllers -- it would allow the
> > same
> > > > type
> > > > > > of
> > > > > > > > > sanity check for quorum controllers.
> > > > > > > > >
> > > > > > > >
> > > > > > > > That's a good point.  I will add broker.id back, and also add
> > > > > > > > controller.id as a possibility.
> > > > > > > >
> > > > > > > > cheers,
> > > > > > > > Colin
> > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Nov 30, 2020 at 7:41 PM Colin McCabe <
> > [email protected]
> > > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, Oct 23, 2020, at 16:10, Jun Rao wrote:
> > > > > > > > > > > Hi, Colin,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the reply. A few more comments.
> > > > > > > > > >
> > > > > > > > > > Hi Jun,
> > > > > > > > > >
> > > > > > > > > > Thanks again for the reply.  Sorry for the long hiatus.  I
> > was
> > > > on
> > > > > > > > vacation for a while.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 55. There is still text that favors new broker
> > registration.
> > > > > > "When a
> > > > > > > > broker
> > > > > > > > > > > first starts up, when it is in the INITIAL state, it will
> > > > always
> > > > > > > > "win"
> > > > > > > > > > > broker ID conflicts.  However, once it is granted a
> > lease, it
> > > > > > > > transitions
> > > > > > > > > > > out of the INITIAL state.  Thereafter, it may lose
> > subsequent
> > > > > > > > conflicts if
> > > > > > > > > > > its broker epoch is stale.  (See KIP-380 for some
> > background
> > > > on
> > > > > > > > broker
> > > > > > > > > > > epoch.)  The reason for favoring new processes is to
> > > > accommodate
> > > > > > the
> > > > > > > > common
> > > > > > > > > > > case where a process is killed with kill -9 and then
> > > > restarted.
> > > > > > We
> > > > > > > > want it
> > > > > > > > > > > to be able to reclaim its old ID quickly in this case."
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks for the reminder.  I have clarified the language
> > here.
> > > > > > > > Hopefully now it is clear that we don't allow quick re-use of
> > > > broker
> > > > > > IDs.
> > > > > > > > > >
> > > > > > > > > > > 80.1 Sounds good. Could you document that listeners is a
> > > > required
> > > > > > > > config
> > > > > > > > > > > now? It would also be useful to annotate other required
> > > > configs.
> > > > > > For
> > > > > > > > > > > example, controller.connect should be required.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I added a note specifying that these are required.
> > > > > > > > > >
> > > > > > > > > > > 80.2 Could you list all deprecated existing configs?
> > Another
> > > > one
> > > > > > is
> > > > > > > > > > > control.plane.listener.name since the controller no
> > longer
> > > > sends
> > > > > > > > > > > LeaderAndIsr, UpdateMetadata and StopReplica requests.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I added a section specifying some deprecated configs.
> > > > > > > > > >
> > > > > > > > > > > 83.1 It seems that the broker can transition from FENCED
> > to
> > > > > > RUNNING
> > > > > > > > without
> > > > > > > > > > > registering for a new broker epoch. I am not sure how
> > this
> > > > works.
> > > > > > > > Once the
> > > > > > > > > > > controller fences a broker, there is no need for the
> > > > controller
> > > > > > to
> > > > > > > > keep the
> > > > > > > > > > > boker epoch around. So, if the fenced broker's heartbeat
> > > > request
> > > > > > > > with the
> > > > > > > > > > > existing broker epoch will be rejected, leading the
> > broker
> > > > back
> > > > > > to
> > > > > > > > the
> > > > > > > > > > > FENCED state again.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The broker epoch refers to the broker registration.  So we
> > DO
> > > > keep
> > > > > > the
> > > > > > > > broker epoch around even while the broker is fenced.
> > > > > > > > > >
> > > > > > > > > > The broker epoch changes only when there is a new broker
> > > > > > > > registration.  Fencing or unfencing the broker doesn't change
> > the
> > > > > > broker
> > > > > > > > epoch.
> > > > > > > > > >
> > > > > > > > > > > 83.5 Good point on KIP-590. Then should we expose the
> > > > controller
> > > > > > for
> > > > > > > > > > > debugging purposes? If not, we should deprecate the
> > > > controllerID
> > > > > > > > field in
> > > > > > > > > > > MetadataResponse?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I think it's OK to expose it for now, with the proviso
> > that it
> > > > > > won't
> > > > > > > > be reachable by clients.
> > > > > > > > > >
> > > > > > > > > > > 90. We rejected the shared ID with just one reason "This
> > is
> > > > not a
> > > > > > > > good idea
> > > > > > > > > > > because NetworkClient assumes a single ID space.  So if
> > > > there is
> > > > > > > > both a
> > > > > > > > > > > controller 1 and a broker 1, we don't have a way of
> > picking
> > > > the
> > > > > > > > "right"
> > > > > > > > > > > one." This doesn't seem to be a strong reason. For
> > example,
> > > > we
> > > > > > could
> > > > > > > > > > > address the NetworkClient issue with the node type as you
> > > > pointed
> > > > > > > > out or
> > > > > > > > > > > using the negative value of a broker ID as the
> > controller ID.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > It would require a lot of code changes to support multiple
> > > > types of
> > > > > > > > node IDs.  It's not clear to me that the end result would be
> > > > better --
> > > > > > I
> > > > > > > > tend to think it would be worse, since it would be more
> > complex.
> > > > In a
> > > > > > > > similar vein, using negative numbers seems dangerous, since we
> > use
> > > > > > > > negatives or -1 as "special values" in many places.  For
> > example,
> > > > -1
> > > > > > often
> > > > > > > > represents "no such node."
> > > > > > > > > >
> > > > > > > > > > One important thing to keep in mind is that we want to be
> > able
> > > > to
> > > > > > > > transition from a broker and a controller being co-located to
> > them
> > > > no
> > > > > > > > longer being co-located.  This is much easier to do when they
> > have
> > > > > > separate
> > > > > > > > IDs.
> > > > > > > > > >
> > > > > > > > > > > 100. In KIP-589
> > > > > > > > > > > <
> > > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller
> > > > > > > > >,
> > > > > > > > > > > the broker reports all offline replicas due to a disk
> > > > failure to
> > > > > > the
> > > > > > > > > > > controller. It seems this information needs to be
> > persisted
> > > > to
> > > > > > the
> > > > > > > > > > > metadata
> > > > > > > > > > > log. Do we have a corresponding record for that?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Hmm, I have to look into this a little bit more.  We may
> > need
> > > > a new
> > > > > > > > record type.
> > > > > > > > > >
> > > > > > > > > > > 101. Currently, StopReplica request has 2 modes, without
> > > > deletion
> > > > > > > > and with
> > > > > > > > > > > deletion. The former is used for controlled shutdown and
> > > > handling
> > > > > > > > disk
> > > > > > > > > > > failure, and causes the follower to stop. The latter is
> > for
> > > > topic
> > > > > > > > deletion
> > > > > > > > > > > and partition reassignment, and causes the replica to be
> > > > deleted.
> > > > > > > > Since we
> > > > > > > > > > > are deprecating StopReplica, could we document what
> > triggers
> > > > the
> > > > > > > > stopping
> > > > > > > > > > > of a follower and the deleting of a replica now?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > RemoveTopic triggers deletion.  In general the
> > functionality of
> > > > > > > > StopReplica is subsumed by the metadata records.
> > > > > > > > > >
> > > > > > > > > > > 102. Should we include the metadata topic in the
> > > > > > MetadataResponse?
> > > > > > > > If so,
> > > > > > > > > > > when it will be included and what will the metadata
> > response
> > > > look
> > > > > > > > like?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > No, it won't be included in the metadata response sent back
> > > > from
> > > > > > the
> > > > > > > > brokers.
> > > > > > > > > >
> > > > > > > > > > > 103. "The active controller assigns the broker a new
> > broker
> > > > > > epoch,
> > > > > > > > based on
> > > > > > > > > > > the latest committed offset in the log." This seems
> > > > inaccurate
> > > > > > since
> > > > > > > > the
> > > > > > > > > > > latest committed offset doesn't always advance on every
> > log
> > > > > > append.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Given that the new broker epoch won't be visible until the
> > > > commit
> > > > > > has
> > > > > > > > happened, I have changed this to "the next available offset in
> > the
> > > > log"
> > > > > > > > > >
> > > > > > > > > > > 104. REGISTERING(1) : It says "Otherwise, the broker
> > moves
> > > > into
> > > > > > the
> > > > > > > > FENCED
> > > > > > > > > > > state.". It seems this should be RUNNING?
> > > > > > > > > > >
> > > > > > > > > > > 105. RUNNING: Should we require the broker to catch up
> > to the
> > > > > > > > metadata log
> > > > > > > > > > > to get into this state?
> > > > > > > > > >
> > > > > > > > > > For 104 and 105, these sections have been reworked.
> > > > > > > > > >
> > > > > > > > > > best,
> > > > > > > > > > Colin
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Jun
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Oct 23, 2020 at 1:20 PM Colin McCabe <
> > > > [email protected]
> > > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Wed, Oct 21, 2020, at 05:51, Tom Bentley wrote:
> > > > > > > > > > > > > Hi Colin,
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Oct 19, 2020, at 08:59, Ron Dagostino wrote:
> > > > > > > > > > > > > > > Hi Colin.  Thanks for the hard work on this KIP.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I have some questions about what happens to a
> > broker
> > > > > > when it
> > > > > > > > becomes
> > > > > > > > > > > > > > > fenced (e.g. because it can't send a heartbeat
> > > > request to
> > > > > > > > keep its
> > > > > > > > > > > > > > > lease).  The KIP says "When a broker is fenced,
> > it
> > > > cannot
> > > > > > > > process any
> > > > > > > > > > > > > > > client requests.  This prevents brokers which
> > are not
> > > > > > > > receiving
> > > > > > > > > > > > > > > metadata updates or that are not receiving and
> > > > processing
> > > > > > > > them fast
> > > > > > > > > > > > > > > enough from causing issues to clients." And in
> > the
> > > > > > > > description of the
> > > > > > > > > > > > > > > FENCED(4) state it likewise says "While in this
> > > > state,
> > > > > > the
> > > > > > > > broker
> > > > > > > > > > > > does
> > > > > > > > > > > > > > > not respond to client requests."  It makes sense
> > > > that a
> > > > > > > > fenced broker
> > > > > > > > > > > > > > > should not accept producer requests -- I assume
> > any
> > > > such
> > > > > > > > requests
> > > > > > > > > > > > > > > would result in NotLeaderOrFollowerException.
> > But
> > > > what
> > > > > > > > about KIP-392
> > > > > > > > > > > > > > > (fetch from follower) consumer requests?  It is
> > > > > > conceivable
> > > > > > > > that
> > > > > > > > > > > > these
> > > > > > > > > > > > > > > could continue.  Related to that, would a fenced
> > > > broker
> > > > > > > > continue to
> > > > > > > > > > > > > > > fetch data for partitions where it thinks it is a
> > > > > > follower?
> > > > > > > > Even if
> > > > > > > > > > > > > > > it rejects consumer requests it might still
> > continue
> > > > to
> > > > > > > > fetch as a
> > > > > > > > > > > > > > > follower.  Might it be helpful to clarify both
> > > > decisions
> > > > > > > > here?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Ron,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Good question.  I think a fenced broker should
> > > > continue to
> > > > > > > > fetch on
> > > > > > > > > > > > > > partitions it was already fetching before it was
> > > > fenced,
> > > > > > > > unless it
> > > > > > > > > > > > hits a
> > > > > > > > > > > > > > problem.  At that point it won't be able to
> > continue,
> > > > > > since it
> > > > > > > > doesn't
> > > > > > > > > > > > have
> > > > > > > > > > > > > > the new metadata.  For example, it won't know about
> > > > > > leadership
> > > > > > > > changes
> > > > > > > > > > > > in
> > > > > > > > > > > > > > the partitions it's fetching.  The rationale for
> > > > > > continuing to
> > > > > > > > fetch
> > > > > > > > > > > > is to
> > > > > > > > > > > > > > try to avoid disruptions as much as possible.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I don't think fenced brokers should accept client
> > > > requests.
> > > > > > > > The issue
> > > > > > > > > > > > is
> > > > > > > > > > > > > > that the fenced broker may or may not have any
> > data it
> > > > is
> > > > > > > > supposed to
> > > > > > > > > > > > > > have.  It may or may not have applied any
> > configuration
> > > > > > > > changes, etc.
> > > > > > > > > > > > that
> > > > > > > > > > > > > > it is supposed to have applied.  So it could get
> > pretty
> > > > > > > > confusing, and
> > > > > > > > > > > > also
> > > > > > > > > > > > > > potentially waste the client's time.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > When fenced, how would the broker reply to a client
> > > > which did
> > > > > > > > make a
> > > > > > > > > > > > > request?
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Tom,
> > > > > > > > > > > >
> > > > > > > > > > > > The broker will respond with a retryable error in that
> > > > case.
> > > > > > Once
> > > > > > > > the
> > > > > > > > > > > > client has re-fetched its metadata, it will no longer
> > see
> > > > the
> > > > > > > > fenced broker
> > > > > > > > > > > > as part of the cluster.  I added a note to the KIP.
> > > > > > > > > > > >
> > > > > > > > > > > > best,
> > > > > > > > > > > > Colin
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Tom
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
>

Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

Reply via email to