Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

Jun Rao Thu, 17 Dec 2020 10:20:10 -0800

Hi, Colin,

Thanks for the reply.


211. Hmm, I still don't see a clear benefit of registering a broker before
recovery. It's possible for the recovery to take time. However, during the
recovery mode, it seems the broker will still be in the fenced mode and
won't be able to do work for the clients. So, registering and heartbeating
early seems to only add unnecessary overhead. For your point on topic
creation, I thought we now allow replicas to be created on
unregistered brokers.

230. Currently, we do have a ControllerId field in MetadataResponse. In the
early discussion, I thought that we want to expose the controller for
debugging purposes, but not used by the client library.

Jun

On Wed, Dec 16, 2020 at 9:13 PM Colin McCabe <[email protected]> wrote:

> On Wed, Dec 16, 2020, at 18:13, Jun Rao wrote:
> > Hi, Colin,
> >
> > Thanks for the reply. Just a couple of more comments.
> >
> > 211. Currently, the broker only registers itself in ZK after log
> recovery.
> > Is there any benefit to change that? As you mentioned, the broker can't
> do
> > much before completing log recovery.
> >
>
> Hi Jun,
>
> Previously, it wasn't possible to register in ZK without immediately
> getting added to the MetadataResponse.  So I think that's the main reason
> why registration was delayed until after log recovery.  Since that
> constraint doesn't exist any more, there seems to be no reason to delay
> registration.
>
> I think delaying registration would have some major downsides.  If log
> recovery takes a while, that's a longer window during which someone else
> could register a broker with the same ID.  Having parts of the cluster
> missing for a while gives up some of the benefit of separating registration
> from fencing.  For example, if a broker somehow gets unregistered and we
> want to re-register it, but we have to wait for a 10 minute log recovery to
> do that, that could be a window during which topics can't be created that
> need to be on that broker, and so forth.
>
> > 230. Regarding MetadataResponse, there is a slight awkwardness. We return
> > rack for each node. However, if that node is for the controller, the rack
> > field is not really relevant. Should we clean it up here or in another
> KIP
> > like KIP-700?
>
> Oh, controllers don't appear in the MetadataResponses returned to clients,
> since clients can't access them.  I should have been more clear about that
> in the KIP-- I added a sentence to "Networking" describing this.
>
> best,
> Colin
>
> >
> > Thanks,
> >
> > Jun
> >
> > On Wed, Dec 16, 2020 at 4:23 PM Colin McCabe <[email protected]> wrote:
> >
> > > On Wed, Dec 16, 2020, at 13:40, Jun Rao wrote:
> > > > Hi, Colin,
> > > >
> > > > Thanks for the reply. A few follow up comments.
> > > >
> > > > 211. When does the broker send the BrokerRegistration request to the
> > > > controller? Is it after the recovery phase? If so, at that point, the
> > > > broker has already caught up on the metadata (in order to clean up
> > > deleted
> > > > partitions). Then, it seems we don't need the ShouldFence field
> > > > in BrokerHeartbeatRequest?
> > >
> > > Hi Jun,
> > >
> > > Thanks again for the reviews.
> > >
> > > The broker sends the registration request as soon as it starts up.  It
> > > cannot wait until the recovery phase is over since sometimes log
> recovery
> > > can take quite a long time.
> > >
> > > >
> > > > 213. kafka-cluster.sh
> > > > 213.1 For the decommision example, should the command take a broker
> id?
> > >
> > > Yes, the example should have a broker id.  Fixed.
> > >
> > > > 213.2 Existing tools use the "--" command line option (e.g.
> kafka-topics
> > > > --list --topic test). Should we follow the same convention
> > > > for kafka-cluster.sh (and kafka-storage.sh)?
> > >
> > > Hmm.  I don't think argparse4j supports using double dashes to identify
> > > subcommands.  I think it might be confusing as well, since the
> subcommand
> > > must come first, unlike a plain old argument which can be anywhere on
> the
> > > command line.
> > >
> > > > 213.3 Should we add an admin api for broker decommission so that
> this can
> > > > be done programmatically?
> > > >
> > >
> > > Yes, there is an admin client API for decommissioning as well.
> > >
> > > > 220. DecommissionBroker: "NOT_CONTROLLER if the node that the
> request was
> > > > sent to is not the controller". I thought the clients never send a
> > > request
> > > > to the controller directly now and the broker will forward it to the
> > > > controller?
> > > >
> > >
> > > If the controller moved recently, it's possible that the broker could
> send
> > > to a controller that has just recently become inactive.  In that case
> > > NOT_CONTROLLER would be returned.  (A standby controller returns
> > > NOT_CONTROLLER for most APIs).
> > >
> > > > 221. Could we add the required ACL for the new requests?
> > > >
> > >
> > > Good point.  I added the required ACL for each new RPC.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > > Jun
> > > >
> > > > On Wed, Dec 16, 2020 at 11:38 AM Colin McCabe <[email protected]>
> > > wrote:
> > > >
> > > > > On Wed, Dec 16, 2020, at 09:59, Jun Rao wrote:
> > > > > > Hi, Colin,
> > > > > >
> > > > > > Thanks for the reply. A few more comments below.
> > > > > >
> > > > >
> > > > > Hi Jun,
> > > > >
> > > > > Thanks for the comments.
> > > > >
> > > > > > 206. "RemoveTopic is the last step, that scrubs all metadata
> about
> > > the
> > > > > > topic.  In order to get to that last step, the topic data needs
> to
> > > > > removed
> > > > > > from all brokers (after each broker notices that the topic is
> being
> > > > > > deleted)." Currently, this is done based on the response of
> > > > > > StopReplicaRequest. Since the controller no longer sends this
> > > request,
> > > > > how
> > > > > > does the controller know that the data for the deleted topic has
> > > > > > been removed in the brokers?
> > > > > >
> > > > >
> > > > > That's a good point... definitely an oversight.
> > > > >
> > > > > It seems complex to force the controller to track when log
> directories
> > > > > have been deleted.  Let's just assume that KIP-516 has been
> > > implemented,
> > > > > and track them by UUID.  Then we can just have a single topic
> deletion
> > > > > record.
> > > > >
> > > > > I added a section on "topic identifiers" describing this.
> > > > >
> > > > > > 210. Thanks for the explanation. Sounds good to me.
> > > > > >
> > > > > > 211. I still don't see an example when ShouldFence is set to
> true in
> > > > > > BrokerHeartbeatReques. Could we add one?
> > > > > >
> > > > >
> > > > > It is sent to true when the broker is first starting up and
> doesn't yet
> > > > > want to be unfenced.  I added a longer explanation of this in the
> > > "Broker
> > > > > Leases" section.
> > > > >
> > > > > > 213. The KIP now allows replicas to be assigned on brokers that
> are
> > > > > fenced,
> > > > > > which is an improvement. How do we permanently remove a broker
> (e.g.
> > > > > > cluster shrinking) to prevent it from being used for future
> replica
> > > > > > assignments?
> > > > > >
> > > > >
> > > > > This is a fair point.  I will create a kafka-cluster.sh script
> which
> > > can
> > > > > do this, plus a DecommissionBrokerRequest.
> > > > >
> > > > > As a bonus the kafka-cluster.sh script can help people find the
> > > cluster ID
> > > > > of brokers, something that people have wanted a tool for in the
> past.
> > > > >
> > > > > > 214. "Currently, when a node is down, all of its ZK registration
> > > > > > information is gone.  But  we need this information in order to
> > > > > understand
> > > > > > things like whether the replicas of a particular partition are
> > > > > > well-balanced across racks." This is not quite true. Currently,
> even
> > > when
> > > > > > ZK registration is gone, the existing replica assignment is still
> > > > > available
> > > > > > in the metadata response.
> > > > > >
> > > > >
> > > > > I agree that the existing replica assignment is still available.
> But
> > > that
> > > > > just tells you partition X is on nodes A, B, and C.  If you don't
> have
> > > the
> > > > > ZK registration for one or more of A, B, or C then you don't know
> > > whether
> > > > > we are following the policy of "two replicas on one rack, one
> replica
> > > on
> > > > > another."  Or any other more complex rack placement policy that you
> > > might
> > > > > have.
> > > > >
> > > > > best,
> > > > > Colin
> > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Wed, Dec 16, 2020 at 8:48 AM Colin McCabe <[email protected]
> >
> > > wrote:
> > > > > >
> > > > > > > On Tue, Dec 15, 2020, at 13:08, Jun Rao wrote:
> > > > > > > > Hi, Colin,
> > > > > > > >
> > > > > > > > Thanks for the reply. A few more follow up comments.
> > > > > > > >
> > > > > > > > 210. initial.broker.registration.timeout.ms: The default
> value
> > > is
> > > > > 90sec,
> > > > > > > > which seems long. If a broker fails the registration because
> of
> > > > > incorrect
> > > > > > > > configs, we want to fail faster. In comparison, the defaults
> for
> > > > > > > > zookeeper.connection.timeout.ms is 18 secs.
> > > > > > > >
> > > > > > >
> > > > > > > Hi Jun,
> > > > > > >
> > > > > > > I agree that the initial connection timeout here is longer than
> > > what we
> > > > > > > had with ZK.  The main reason I selected a slightly longer
> timeout
> > > > > here is
> > > > > > > to handle the case where the controllers and the brokers are
> > > > > co-located.
> > > > > > > For example, if you have a 3 node cluster and all three nodes
> are
> > > > > > > controllers+brokers, when you first bring up the cluster, we
> will
> > > have
> > > > > to
> > > > > > > stand up the controller quorum and then handle broker
> > > registrations.
> > > > > > > Although we believe Raft will start up relatively quickly, it's
> > > good to
> > > > > > > leave some extra margin here.
> > > > > > >
> > > > > > > I don't think there's a big disadvantage to having a slightly
> > > longer
> > > > > > > timeout here.  After all, starting up the brokers with no
> > > controllers
> > > > > is an
> > > > > > > admin mistake, which we don't expect to see very often.  Maybe
> > > let's
> > > > > set it
> > > > > > > to 60 seconds for now and maybe see if we can tweak it in the
> > > future if
> > > > > > > that turns out to be too long or short?
> > > > > > >
> > > > > > > > 211. "Once they are all moved, the controller responds to the
> > > > > heartbeat
> > > > > > > > with a nextState of SHUTDOWN." It seems that nextState is no
> > > longer
> > > > > > > > relevant. Also, could you add a bit more explanation on when
> > > > > ShouldFence
> > > > > > > is
> > > > > > > > set to true in BrokerHeartbeatRequest?
> > > > > > > >
> > > > > > >
> > > > > > > Good catch.  I removed the obsolete section referring to
> nextState
> > > and
> > > > > > > added a reference to the new boolean.  I also added some more
> > > > > information
> > > > > > > about ShouldFence and about the rationale for separating
> fencing
> > > from
> > > > > > > registration.
> > > > > > >
> > > > > > > > 212. Related to the above, what does the broker do if
> IsFenced is
> > > > > true in
> > > > > > > > the BrokerHeartbeatResponse? Will the broker do the same
> thing on
> > > > > > > receiving
> > > > > > > > a FenceBrokerRecord from the metadata log?
> > > > > > > >
> > > > > > >
> > > > > > > The broker only checks this boolean during the startup process.
> > > After
> > > > > the
> > > > > > > startup process is finished, it ignores this boolean.
> > > > > > >
> > > > > > > The broker uses fence / unfence records in the metadata log to
> > > > > determine
> > > > > > > which brokers should appear in its MetadataResponse.
> > > > > > >
> > > > > > > best,
> > > > > > > Colin
> > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > > On Tue, Dec 15, 2020 at 8:51 AM Colin McCabe <
> [email protected]
> > > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > On Tue, Dec 15, 2020, at 04:13, Tom Bentley wrote:
> > > > > > > > > > Hi Colin,
> > > > > > > > > >
> > > > > > > > > > The KIP says that "brokers which are fenced will not
> appear
> > > in
> > > > > > > > > > MetadataResponses.  So clients that have up-to-date
> metadata
> > > will
> > > > > > > not try
> > > > > > > > > > to contact fenced brokers.", which is fine, but it
> doesn't
> > > seem
> > > > > to
> > > > > > > > > > elaborate on what happens for clients which try to
> contact a
> > > > > broker
> > > > > > > > > (using
> > > > > > > > > > stale metadata, for example) and find it in the
> > > > > registered-but-fenced
> > > > > > > > > > state.
> > > > > > > > >
> > > > > > > > > Hi Tom,
> > > > > > > > >
> > > > > > > > > I have dropped the broker-side fencing from this
> proposal.  So
> > > now
> > > > > the
> > > > > > > > > fencing is basically the same as today's fencing: it's
> > > > > controller-side
> > > > > > > > > only.  That means that clients using stale metadata can
> contact
> > > > > fenced
> > > > > > > > > brokers and communicate with them.  The only case where the
> > > broker
> > > > > > > will be
> > > > > > > > > inaccessible is when it hasn't finished starting yet
> (i.e., has
> > > > > not yet
> > > > > > > > > attained RUNNING state.)
> > > > > > > > >
> > > > > > > > > Just like today, we expect the safeguards built into the
> > > > > replication
> > > > > > > > > protocol to prevent the worst corner cases that could
> result
> > > from
> > > > > > > this.  I
> > > > > > > > > do think we will probably take up this issue later and
> find a
> > > > > better
> > > > > > > > > solution for client-side metadata, but this KIP is big
> enough
> > > > > as-is.
> > > > > > > > >
> > > > > > > > > best,
> > > > > > > > > Colin
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > You said previously that the broker will respond with a
> > > retriable
> > > > > > > > > > error, which is again fine, but you didn't answer my
> question
> > > > > about
> > > > > > > which
> > > > > > > > > > error code you've chosen for this. I realise that this
> > > doesn't
> > > > > really
> > > > > > > > > > affect the correctness of the behaviour, but I'm not
> aware
> > > of any
> > > > > > > > > existing
> > > > > > > > > > error code which is a good fit. So it would be good to
> > > understand
> > > > > > > about
> > > > > > > > > how
> > > > > > > > > > you're making this backward compatible for clients.
> > > > > > > > > >
> > > > > > > > > > Many thanks,
> > > > > > > > > >
> > > > > > > > > > Tom
> > > > > > > > > >
> > > > > > > > > > On Tue, Dec 15, 2020 at 1:42 AM Colin McCabe <
> > > [email protected]
> > > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > On Fri, Dec 11, 2020, at 17:07, Jun Rao wrote:
> > > > > > > > > > > > Hi, Colin,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the reply. Just a couple of more comments
> > > below.
> > > > > > > > > > > >
> > > > > > > > > > > > 210. Since we are deprecating
> > > > > zookeeper.connection.timeout.ms,
> > > > > > > > > should we
> > > > > > > > > > > > add a new config to bound the time for a broker to
> > > connect
> > > > > to the
> > > > > > > > > > > > controller during starting up?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Good idea.  I added
> initial.broker.registration.timeout.ms
> > > for
> > > > > > > this.
> > > > > > > > > > >
> > > > > > > > > > > > 211. BrokerHeartbeat no longer has the state field
> in the
> > > > > > > > > > > request/response.
> > > > > > > > > > > > However, (a) the controller shutdown section still
> has
> > > "In
> > > > > its
> > > > > > > > > periodic
> > > > > > > > > > > > heartbeats, the broker asks the controller if it can
> > > > > transition
> > > > > > > into
> > > > > > > > > the
> > > > > > > > > > > > SHUTDOWN state.  This motivates the controller to
> move
> > > all
> > > > > of the
> > > > > > > > > leaders
> > > > > > > > > > > > off of that broker.  Once they are all moved, the
> > > controller
> > > > > > > > > responds to
> > > > > > > > > > > > the heartbeat with a nextState of SHUTDOWN."; (2) the
> > > > > > > description of
> > > > > > > > > > > > BrokerHeartbeat still references currentState and
> > > > > targetState.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks.  I've made these sections clearer and removed
> the
> > > > > obsolete
> > > > > > > > > > > references to sending states.
> > > > > > > > > > >
> > > > > > > > > > > best,
> > > > > > > > > > > Colin
> > > > > > > > > > >
> > > > > > > > > > > > Jun
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Dec 11, 2020 at 1:33 PM Colin McCabe <
> > > > > [email protected]
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Dec 9, 2020, at 10:10, Jun Rao wrote:
> > > > > > > > > > > > > > Hi, Colin,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the update. A few more follow up
> comments.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi Jun,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks again for the review.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 100. FailedReplicaRecord: Since this is reported
> by
> > > each
> > > > > > > broker
> > > > > > > > > > > > > > independently, perhaps we could use a more
> concise
> > > > > > > representation
> > > > > > > > > > > that
> > > > > > > > > > > > > has
> > > > > > > > > > > > > > a top level broker field, an array of topics,
> which
> > > has
> > > > > an
> > > > > > > array
> > > > > > > > > of
> > > > > > > > > > > > > > partitions.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > The issue is that there is a size limit on the each
> > > record.
> > > > > > > > > Putting
> > > > > > > > > > > all
> > > > > > > > > > > > > of the partitions of a log directory into a single
> > > record
> > > > > would
> > > > > > > > > > > probably
> > > > > > > > > > > > > break that in many cases.  Still, we can optimize a
> > > bit by
> > > > > > > having
> > > > > > > > > an
> > > > > > > > > > > array
> > > > > > > > > > > > > of partition IDs, since nearly all the time, we
> have
> > > more
> > > > > than
> > > > > > > one
> > > > > > > > > > > from the
> > > > > > > > > > > > > same topic.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 200. Sounds good. If we remove the broker-side
> > > fencing
> > > > > > > logic, do
> > > > > > > > > we
> > > > > > > > > > > plan
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > still keep FENCED in broker state? Do we plan to
> > > expose
> > > > > the
> > > > > > > new
> > > > > > > > > > > states
> > > > > > > > > > > > > > through the existing BrokerState metric and if
> so,
> > > what
> > > > > are
> > > > > > > the
> > > > > > > > > > > values
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > the new states?
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > No, we don't need FENCED any more.  I have removed
> it
> > > from
> > > > > the
> > > > > > > KIP.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The new states are very similar to the current
> ones,
> > > > > actually.
> > > > > > > > > There
> > > > > > > > > > > are
> > > > > > > > > > > > > no new states or removed ones.  The main change in
> the
> > > > > broker
> > > > > > > state
> > > > > > > > > > > machine
> > > > > > > > > > > > > is that the RECOVERING_FROM_UNCLEAN_SHUTDOWN state
> has
> > > been
> > > > > > > > > renamed to
> > > > > > > > > > > > > RECOVERY.  Also, unlike previously, the broker will
> > > always
> > > > > pass
> > > > > > > > > through
> > > > > > > > > > > > > RECOVERY (although it may only stay in this state
> for a
> > > > > very
> > > > > > > short
> > > > > > > > > > > amount
> > > > > > > > > > > > > of time).
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 201. This may be fine too. Could we document what
> > > happens
> > > > > > > when
> > > > > > > > > the
> > > > > > > > > > > > > > broker.id/controller.id in metadata.properties
> don't
> > > > > match
> > > > > > > the
> > > > > > > > > > > broker
> > > > > > > > > > > > > > config when the broker starts up?
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I added some documentation about this.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 204. There is still "The highest metadata offset
> > > which
> > > > > the
> > > > > > > broker
> > > > > > > > > > > has not
> > > > > > > > > > > > > > reached" referenced under BrokerRegistration.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > It should be CurrentMetadataOffset.  Fixed.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 206. Is that separate step needed given KIP-516?
> With
> > > > > > > KIP-516 (
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers#KIP516:TopicIdentifiers-LeaderAndIsr
> > > > > > > > > > > > > ),
> > > > > > > > > > > > > > we don't need to wait for the topic data to be
> > > removed
> > > > > from
> > > > > > > all
> > > > > > > > > > > brokers
> > > > > > > > > > > > > > before removing the topic metadata. The
> combination
> > > of
> > > > > > > unmatching
> > > > > > > > > > > > > > topicId
> > > > > > > > > > > > > > or the missing topicId from the metadata is
> enough
> > > for
> > > > > the
> > > > > > > > > broker to
> > > > > > > > > > > > > > clean
> > > > > > > > > > > > > > up deleted topics asynchronously.
> > > > > > > > > > > > >
> > > > > > > > > > > > > It won't be needed once KIP-516 is adopted, but
> this
> > > hasn't
> > > > > > > been
> > > > > > > > > > > > > implemented yet.
> > > > > > > > > > > > >
> > > > > > > > > > > > > best,
> > > > > > > > > > > > > Colin
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Dec 8, 2020 at 5:27 PM Colin McCabe <
> > > > > > > [email protected]>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, Dec 3, 2020, at 16:37, Jun Rao wrote:
> > > > > > > > > > > > > > > > Hi, Colin,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for the updated KIP. A few more
> comments
> > > > > below.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Jun,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks again for the reviews.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 80.2 For deprecated configs, we need to
> include
> > > > > > > zookeeper.*
> > > > > > > > > and
> > > > > > > > > > > > > > > > broker.id.generation.enable.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Added.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 83.1 If a broker is down, does the controller
> > > keep
> > > > > the
> > > > > > > > > previously
> > > > > > > > > > > > > > > > registered broker epoch forever? If not, how
> long
> > > > > does
> > > > > > > the
> > > > > > > > > > > controller
> > > > > > > > > > > > > > > keep
> > > > > > > > > > > > > > > > it? What does the controller do when
> receiving a
> > > > > broker
> > > > > > > > > heartbeat
> > > > > > > > > > > > > request
> > > > > > > > > > > > > > > > with an unfound broker epoch?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yes, the controller keeps the previous
> registration
> > > > > > > forever.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Broker heartbeat requests with an incorrect
> broker
> > > > > epoch
> > > > > > > will
> > > > > > > > > be
> > > > > > > > > > > > > rejected
> > > > > > > > > > > > > > > with STALE_BROKER_EPOCH.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 100. Have you figured out if we need to add
> a new
> > > > > record
> > > > > > > > > type for
> > > > > > > > > > > > > > > reporting
> > > > > > > > > > > > > > > > partitions on failed disks?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I added FailedReplicaRecord to reflect the case
> > > where a
> > > > > > > JBOD
> > > > > > > > > > > directory
> > > > > > > > > > > > > has
> > > > > > > > > > > > > > > failed, leading to failed replicas.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 102. For debugging purposes, sometimes it's
> > > useful to
> > > > > > > read
> > > > > > > > > the
> > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > topic using tools like console-consumer.
> Should
> > > we
> > > > > > > support
> > > > > > > > > that
> > > > > > > > > > > and
> > > > > > > > > > > > > if
> > > > > > > > > > > > > > > so,
> > > > > > > > > > > > > > > > how?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > For now, we have the ability to read the
> metadata
> > > logs
> > > > > > > with the
> > > > > > > > > > > > > dump-logs
> > > > > > > > > > > > > > > tool.  I think we will come up with some other
> > > tools
> > > > > in the
> > > > > > > > > future
> > > > > > > > > > > as
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > get experience.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 200. "brokers which are fenced will not
> appear in
> > > > > > > > > > > MetadataResponses.
> > > > > > > > > > > > > The
> > > > > > > > > > > > > > > > broker will not respond to these requests--
> > > instead,
> > > > > it
> > > > > > > will
> > > > > > > > > > > simply
> > > > > > > > > > > > > > > > disconnect." If the controller is
> partitioned off
> > > > > from
> > > > > > > the
> > > > > > > > > > > brokers,
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > design will cause every broker to stop
> accepting
> > > new
> > > > > > > client
> > > > > > > > > > > > > requests. In
> > > > > > > > > > > > > > > > contrast, if ZK is partitioned off, the
> existing
> > > > > > > behavior is
> > > > > > > > > > > that the
> > > > > > > > > > > > > > > > brokers can continue to work based on the
> last
> > > known
> > > > > > > > > metadata.
> > > > > > > > > > > So, I
> > > > > > > > > > > > > am
> > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > sure if we should change the existing
> behavior
> > > > > because
> > > > > > > of the
> > > > > > > > > > > bigger
> > > > > > > > > > > > > > > impact
> > > > > > > > > > > > > > > > in the new one. Another option is to keep the
> > > > > existing
> > > > > > > > > behavior
> > > > > > > > > > > and
> > > > > > > > > > > > > > > expose
> > > > > > > > > > > > > > > > a metric for fenced brokers so that the
> operator
> > > > > could be
> > > > > > > > > > > alerted.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I'm skeptical about how well running without ZK
> > > > > currently
> > > > > > > > > works.
> > > > > > > > > > > > > However,
> > > > > > > > > > > > > > > I will move the broker-side fencing into a
> > > follow-up
> > > > > KIP.
> > > > > > > This
> > > > > > > > > > > KIP is
> > > > > > > > > > > > > > > already pretty large and there is no hard
> > > dependency on
> > > > > > > this.
> > > > > > > > > > > There
> > > > > > > > > > > > > may
> > > > > > > > > > > > > > > also be other ways of accomplishing the
> positive
> > > > > effects of
> > > > > > > > > what
> > > > > > > > > > > > > > > broker-side fencing, so more discussion is
> needed.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 201. I read Ron's comment, but I am still not
> > > sure
> > > > > the
> > > > > > > > > benefit of
> > > > > > > > > > > > > keeping
> > > > > > > > > > > > > > > > broker.id and controller.id in
> meta.properties.
> > > It
> > > > > seems
> > > > > > > > > that
> > > > > > > > > > > we are
> > > > > > > > > > > > > > > just
> > > > > > > > > > > > > > > > duplicating the same info in two places and
> have
> > > the
> > > > > > > > > additional
> > > > > > > > > > > > > burden of
> > > > > > > > > > > > > > > > making sure the values in the two places are
> > > > > consistent.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I think the reasoning is that having broker.id
> > > > > protects us
> > > > > > > > > against
> > > > > > > > > > > > > > > accidentally bringing up a broker with a disk
> from
> > > a
> > > > > > > different
> > > > > > > > > > > > > broker.  I
> > > > > > > > > > > > > > > don't feel strongly about this but it seemed
> > > simpler to
> > > > > > > keep
> > > > > > > > > it.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 202. controller.connect.security.protocol: Is
> > > this
> > > > > needed
> > > > > > > > > since
> > > > > > > > > > > > > > > > controller.listener.names and
> > > > > > > listener.security.protocol.map
> > > > > > > > > > > imply
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > security protocol already?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > You're right, this isn't needed.  I'll remove
> it.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 203. registration.heartbeat.interval.ms: It
> > > > > defaults to
> > > > > > > 2k.
> > > > > > > > > ZK
> > > > > > > > > > > uses
> > > > > > > > > > > > > 1/3
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > the session timeout for heartbeat. So, given
> the
> > > > > default
> > > > > > > 18k
> > > > > > > > > for
> > > > > > > > > > > > > > > > registration.lease.timeout.ms, should we
> default
> > > > > > > > > > > > > > > > registration.heartbeat.interval.ms to 6k?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 6 seconds seems like a pretty long time between
> > > > > > > heartbeats.  It
> > > > > > > > > > > might
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > > useful to know when a broker is missing
> heartbeats,
> > > > > with
> > > > > > > less
> > > > > > > > > time
> > > > > > > > > > > than
> > > > > > > > > > > > > > > that.  I provisionally set it to 3 seconds (we
> can
> > > > > always
> > > > > > > > > change
> > > > > > > > > > > > > later...)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I also changed the name of these
> configurations to
> > > "
> > > > > > > > > > > > > > > broker.heartbeat.interval.ms" and "
> > > > > > > > > broker.registration.timeout.ms"
> > > > > > > > > > > to
> > > > > > > > > > > > > try
> > > > > > > > > > > > > > > to clarify them a bit.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 204. "The highest metadata offset which the
> > > broker
> > > > > has
> > > > > > > not
> > > > > > > > > > > reached."
> > > > > > > > > > > > > It
> > > > > > > > > > > > > > > > seems this should be "has reached".
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I changed this to "one more than the highest
> > > metadata
> > > > > > > offset
> > > > > > > > > which
> > > > > > > > > > > the
> > > > > > > > > > > > > > > broker has reached."
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 205. UnfenceBrokerRecord and
> > > UnregisterBrokerRecord:
> > > > > To
> > > > > > > me,
> > > > > > > > > they
> > > > > > > > > > > > > seem to
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > the same. Do we need both?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Unregistration means that the broker has been
> > > removed
> > > > > from
> > > > > > > the
> > > > > > > > > > > cluster.
> > > > > > > > > > > > > > > That is different than unfencing, which marks
> the
> > > > > broker as
> > > > > > > > > active.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 206. TopicRecord: The Deleting field is used
> to
> > > > > indicate
> > > > > > > > > that the
> > > > > > > > > > > > > topic
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > being deleted. I am wondering if this is
> really
> > > > > needed
> > > > > > > since
> > > > > > > > > > > > > RemoveTopic
> > > > > > > > > > > > > > > > already indicates the same thing.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > RemoveTopic is the last step, that scrubs all
> > > metadata
> > > > > > > about
> > > > > > > > > the
> > > > > > > > > > > topic.
> > > > > > > > > > > > > > > In order to get to that last step, the topic
> data
> > > > > needs to
> > > > > > > > > removed
> > > > > > > > > > > > > from all
> > > > > > > > > > > > > > > brokers (after each broker notices that the
> topic
> > > is
> > > > > being
> > > > > > > > > > > deleted).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > best,
> > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Dec 2, 2020 at 2:50 PM Colin McCabe <
> > > > > > > > > [email protected]>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Dec 2, 2020, at 14:07, Ron
> Dagostino
> > > wrote:
> > > > > > > > > > > > > > > > > > Hi Colin.  Thanks for the updates.  It's
> now
> > > > > clear
> > > > > > > to me
> > > > > > > > > that
> > > > > > > > > > > > > brokers
> > > > > > > > > > > > > > > > > > keep their broker epoch for the life of
> their
> > > > > JVM --
> > > > > > > they
> > > > > > > > > > > > > register
> > > > > > > > > > > > > > > > > > once, get their broker epoch in the
> > > response, and
> > > > > > > then
> > > > > > > > > never
> > > > > > > > > > > > > > > > > > re-register again.  Brokers may get
> fenced,
> > > but
> > > > > they
> > > > > > > > > keep the
> > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > broker epoch for the life of their JVM.
> The
> > > > > > > incarnation
> > > > > > > > > ID
> > > > > > > > > > > is
> > > > > > > > > > > > > also
> > > > > > > > > > > > > > > > > > kept for the life of the JVM but is
> > > generated by
> > > > > the
> > > > > > > > > broker
> > > > > > > > > > > > > itself
> > > > > > > > > > > > > > > > > > upon startup, and the combination of the
> two
> > > > > allows
> > > > > > > the
> > > > > > > > > > > > > Controller to
> > > > > > > > > > > > > > > > > > act idempotently if any previously-sent
> > > > > registration
> > > > > > > > > response
> > > > > > > > > > > > > gets
> > > > > > > > > > > > > > > > > > lost.  Makes sense.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks, Ron.  That's a good summary.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > One thing I wonder about is if it might
> be
> > > > > helpful
> > > > > > > for
> > > > > > > > > the
> > > > > > > > > > > > > broker to
> > > > > > > > > > > > > > > > > > send the Cluster ID as determined from
> its
> > > > > > > > > meta.properties
> > > > > > > > > > > file
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > its
> > > > > > > > > > > > > > > > > > registration request.  Does it even make
> > > sense
> > > > > for
> > > > > > > the
> > > > > > > > > > > broker to
> > > > > > > > > > > > > > > > > > successfully register and enter the
> Fenced
> > > state
> > > > > if
> > > > > > > it
> > > > > > > > > has
> > > > > > > > > > > the
> > > > > > > > > > > > > wrong
> > > > > > > > > > > > > > > > > > Cluster ID?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Yeah, that's a good idea.  Let's have the
> > > broker
> > > > > pass
> > > > > > > its
> > > > > > > > > > > cluster
> > > > > > > > > > > > > ID in
> > > > > > > > > > > > > > > > > the registration RPC, and then
> registration can
> > > > > fail
> > > > > > > if the
> > > > > > > > > > > broker
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > configured for the wrong cluster.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >  The nextMetadatOffset value that the
> broker
> > > > > > > communicates
> > > > > > > > > > > > > > > > > > in its registration request only has
> meaning
> > > > > within
> > > > > > > the
> > > > > > > > > > > correct
> > > > > > > > > > > > > > > > > > cluster, so it feels to me that the
> > > Controller
> > > > > should
> > > > > > > > > have
> > > > > > > > > > > some
> > > > > > > > > > > > > way
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > perform this sanity check.  There is
> > > currently
> > > > > > > (pre-KIP
> > > > > > > > > 500)
> > > > > > > > > > > a
> > > > > > > > > > > > > check
> > > > > > > > > > > > > > > > > > in the broker to make sure its configured
> > > > > cluster ID
> > > > > > > > > matches
> > > > > > > > > > > the
> > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > > > stored in ZooKeeper, and we will have to
> > > perform
> > > > > this
> > > > > > > > > > > validation
> > > > > > > > > > > > > > > > > > somewhere in the KIP-500 world.  If the
> > > > > Controller
> > > > > > > > > doesn't
> > > > > > > > > > > do it
> > > > > > > > > > > > > > > > > > within the registration request then the
> > > broker
> > > > > will
> > > > > > > > > have to
> > > > > > > > > > > > > make a
> > > > > > > > > > > > > > > > > > metadata request to the Controller,
> retrieve
> > > the
> > > > > > > Cluster
> > > > > > > > > ID,
> > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > perform the check itself.  It feels to me
> > > that it
> > > > > > > might
> > > > > > > > > be
> > > > > > > > > > > > > better for
> > > > > > > > > > > > > > > > > > the Controller to just do it, and then
> the
> > > broker
> > > > > > > doesn't
> > > > > > > > > > > have to
> > > > > > > > > > > > > > > > > > worry about it anymore once it
> successfully
> > > > > > > registers.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I also have a question about the
> broker.id
> > > > > value and
> > > > > > > > > > > > > > > meta.properties.
> > > > > > > > > > > > > > > > > > The KIP now says "In version 0 of
> > > > > meta.properties,
> > > > > > > there
> > > > > > > > > is a
> > > > > > > > > > > > > > > > > > broker.id field.  Version 1 does not
> have
> > > this
> > > > > > > field.
> > > > > > > > > It
> > > > > > > > > > > is no
> > > > > > > > > > > > > > > longer
> > > > > > > > > > > > > > > > > > needed because we no longer support
> dynamic
> > > > > broker id
> > > > > > > > > > > > > assignment."
> > > > > > > > > > > > > > > > > > But then there is an example version 1
> > > > > > > meta.properties
> > > > > > > > > file
> > > > > > > > > > > that
> > > > > > > > > > > > > > > shows
> > > > > > > > > > > > > > > > > > the broker.id value.  I actually wonder
> if
> > > > > maybe the
> > > > > > > > > > > broker.id
> > > > > > > > > > > > > value
> > > > > > > > > > > > > > > > > > would be good to keep in the version 1
> > > > > > > meta.properties
> > > > > > > > > file
> > > > > > > > > > > > > because
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > currently (pre-KIP 500, version 0) acts
> as a
> > > > > sanity
> > > > > > > > > check to
> > > > > > > > > > > make
> > > > > > > > > > > > > > > sure
> > > > > > > > > > > > > > > > > > the broker is using the correct log
> > > directory.
> > > > > > > Similarly
> > > > > > > > > > > with
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > controller.id value on controllers -- it
> > > would
> > > > > > > allow the
> > > > > > > > > > > same
> > > > > > > > > > > > > type
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > sanity check for quorum controllers.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > That's a good point.  I will add broker.id
> > > back,
> > > > > and
> > > > > > > also
> > > > > > > > > add
> > > > > > > > > > > > > > > > > controller.id as a possibility.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > cheers,
> > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Mon, Nov 30, 2020 at 7:41 PM Colin
> McCabe
> > > <
> > > > > > > > > > > [email protected]
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Fri, Oct 23, 2020, at 16:10, Jun Rao
> > > wrote:
> > > > > > > > > > > > > > > > > > > > Hi, Colin,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks for the reply. A few more
> > > comments.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hi Jun,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks again for the reply.  Sorry for
> the
> > > long
> > > > > > > > > hiatus.  I
> > > > > > > > > > > was
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > vacation for a while.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 55. There is still text that favors
> new
> > > > > broker
> > > > > > > > > > > registration.
> > > > > > > > > > > > > > > "When a
> > > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > first starts up, when it is in the
> > > INITIAL
> > > > > > > state, it
> > > > > > > > > will
> > > > > > > > > > > > > always
> > > > > > > > > > > > > > > > > "win"
> > > > > > > > > > > > > > > > > > > > broker ID conflicts.  However, once
> it is
> > > > > > > granted a
> > > > > > > > > > > lease, it
> > > > > > > > > > > > > > > > > transitions
> > > > > > > > > > > > > > > > > > > > out of the INITIAL state.
> Thereafter,
> > > it may
> > > > > > > lose
> > > > > > > > > > > subsequent
> > > > > > > > > > > > > > > > > conflicts if
> > > > > > > > > > > > > > > > > > > > its broker epoch is stale.  (See
> KIP-380
> > > for
> > > > > some
> > > > > > > > > > > background
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > epoch.)  The reason for favoring new
> > > > > processes
> > > > > > > is to
> > > > > > > > > > > > > accommodate
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > common
> > > > > > > > > > > > > > > > > > > > case where a process is killed with
> kill
> > > -9
> > > > > and
> > > > > > > then
> > > > > > > > > > > > > restarted.
> > > > > > > > > > > > > > > We
> > > > > > > > > > > > > > > > > want it
> > > > > > > > > > > > > > > > > > > > to be able to reclaim its old ID
> quickly
> > > in
> > > > > this
> > > > > > > > > case."
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for the reminder.  I have
> clarified
> > > the
> > > > > > > language
> > > > > > > > > > > here.
> > > > > > > > > > > > > > > > > Hopefully now it is clear that we don't
> allow
> > > quick
> > > > > > > re-use
> > > > > > > > > of
> > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > IDs.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 80.1 Sounds good. Could you document
> that
> > > > > > > listeners
> > > > > > > > > is a
> > > > > > > > > > > > > required
> > > > > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > > now? It would also be useful to
> annotate
> > > > > other
> > > > > > > > > required
> > > > > > > > > > > > > configs.
> > > > > > > > > > > > > > > For
> > > > > > > > > > > > > > > > > > > > example, controller.connect should be
> > > > > required.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I added a note specifying that these
> are
> > > > > required.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 80.2 Could you list all deprecated
> > > existing
> > > > > > > configs?
> > > > > > > > > > > Another
> > > > > > > > > > > > > one
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > control.plane.listener.name since
> the
> > > > > > > controller no
> > > > > > > > > > > longer
> > > > > > > > > > > > > sends
> > > > > > > > > > > > > > > > > > > > LeaderAndIsr, UpdateMetadata and
> > > StopReplica
> > > > > > > > > requests.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I added a section specifying some
> > > deprecated
> > > > > > > configs.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 83.1 It seems that the broker can
> > > transition
> > > > > from
> > > > > > > > > FENCED
> > > > > > > > > > > to
> > > > > > > > > > > > > > > RUNNING
> > > > > > > > > > > > > > > > > without
> > > > > > > > > > > > > > > > > > > > registering for a new broker epoch.
> I am
> > > not
> > > > > > > sure how
> > > > > > > > > > > this
> > > > > > > > > > > > > works.
> > > > > > > > > > > > > > > > > Once the
> > > > > > > > > > > > > > > > > > > > controller fences a broker, there is
> no
> > > need
> > > > > for
> > > > > > > the
> > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > keep the
> > > > > > > > > > > > > > > > > > > > boker epoch around. So, if the fenced
> > > > > broker's
> > > > > > > > > heartbeat
> > > > > > > > > > > > > request
> > > > > > > > > > > > > > > > > with the
> > > > > > > > > > > > > > > > > > > > existing broker epoch will be
> rejected,
> > > > > leading
> > > > > > > the
> > > > > > > > > > > broker
> > > > > > > > > > > > > back
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > FENCED state again.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > The broker epoch refers to the broker
> > > > > registration.
> > > > > > > > > So we
> > > > > > > > > > > DO
> > > > > > > > > > > > > keep
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > broker epoch around even while the broker
> is
> > > > > fenced.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > The broker epoch changes only when
> there
> > > is a
> > > > > new
> > > > > > > > > broker
> > > > > > > > > > > > > > > > > registration.  Fencing or unfencing the
> broker
> > > > > doesn't
> > > > > > > > > change
> > > > > > > > > > > the
> > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > epoch.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 83.5 Good point on KIP-590. Then
> should
> > > we
> > > > > > > expose the
> > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > debugging purposes? If not, we should
> > > > > deprecate
> > > > > > > the
> > > > > > > > > > > > > controllerID
> > > > > > > > > > > > > > > > > field in
> > > > > > > > > > > > > > > > > > > > MetadataResponse?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I think it's OK to expose it for now,
> with
> > > the
> > > > > > > proviso
> > > > > > > > > > > that it
> > > > > > > > > > > > > > > won't
> > > > > > > > > > > > > > > > > be reachable by clients.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 90. We rejected the shared ID with
> just
> > > one
> > > > > > > reason
> > > > > > > > > "This
> > > > > > > > > > > is
> > > > > > > > > > > > > not a
> > > > > > > > > > > > > > > > > good idea
> > > > > > > > > > > > > > > > > > > > because NetworkClient assumes a
> single ID
> > > > > > > space.  So
> > > > > > > > > if
> > > > > > > > > > > > > there is
> > > > > > > > > > > > > > > > > both a
> > > > > > > > > > > > > > > > > > > > controller 1 and a broker 1, we don't
> > > have a
> > > > > way
> > > > > > > of
> > > > > > > > > > > picking
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > "right"
> > > > > > > > > > > > > > > > > > > > one." This doesn't seem to be a
> strong
> > > > > reason.
> > > > > > > For
> > > > > > > > > > > example,
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > > address the NetworkClient issue with
> the
> > > node
> > > > > > > type
> > > > > > > > > as you
> > > > > > > > > > > > > pointed
> > > > > > > > > > > > > > > > > out or
> > > > > > > > > > > > > > > > > > > > using the negative value of a broker
> ID
> > > as
> > > > > the
> > > > > > > > > > > controller ID.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > It would require a lot of code changes
> to
> > > > > support
> > > > > > > > > multiple
> > > > > > > > > > > > > types of
> > > > > > > > > > > > > > > > > node IDs.  It's not clear to me that the
> end
> > > result
> > > > > > > would
> > > > > > > > > be
> > > > > > > > > > > > > better --
> > > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > > tend to think it would be worse, since it
> > > would be
> > > > > more
> > > > > > > > > > > complex.
> > > > > > > > > > > > > In a
> > > > > > > > > > > > > > > > > similar vein, using negative numbers seems
> > > > > dangerous,
> > > > > > > > > since we
> > > > > > > > > > > use
> > > > > > > > > > > > > > > > > negatives or -1 as "special values" in many
> > > places.
> > > > > > > For
> > > > > > > > > > > example,
> > > > > > > > > > > > > -1
> > > > > > > > > > > > > > > often
> > > > > > > > > > > > > > > > > represents "no such node."
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > One important thing to keep in mind is
> > > that we
> > > > > > > want to
> > > > > > > > > be
> > > > > > > > > > > able
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > transition from a broker and a controller
> being
> > > > > > > co-located
> > > > > > > > > to
> > > > > > > > > > > them
> > > > > > > > > > > > > no
> > > > > > > > > > > > > > > > > longer being co-located.  This is much
> easier
> > > to do
> > > > > > > when
> > > > > > > > > they
> > > > > > > > > > > have
> > > > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > > > IDs.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 100. In KIP-589
> > > > > > > > > > > > > > > > > > > > <
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller
> > > > > > > > > > > > > > > > > >,
> > > > > > > > > > > > > > > > > > > > the broker reports all offline
> replicas
> > > due
> > > > > to a
> > > > > > > disk
> > > > > > > > > > > > > failure to
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > controller. It seems this information
> > > needs
> > > > > to be
> > > > > > > > > > > persisted
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > > > > log. Do we have a corresponding
> record
> > > for
> > > > > that?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hmm, I have to look into this a little
> bit
> > > > > more.
> > > > > > > We
> > > > > > > > > may
> > > > > > > > > > > need
> > > > > > > > > > > > > a new
> > > > > > > > > > > > > > > > > record type.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 101. Currently, StopReplica request
> has 2
> > > > > modes,
> > > > > > > > > without
> > > > > > > > > > > > > deletion
> > > > > > > > > > > > > > > > > and with
> > > > > > > > > > > > > > > > > > > > deletion. The former is used for
> > > controlled
> > > > > > > shutdown
> > > > > > > > > and
> > > > > > > > > > > > > handling
> > > > > > > > > > > > > > > > > disk
> > > > > > > > > > > > > > > > > > > > failure, and causes the follower to
> > > stop. The
> > > > > > > latter
> > > > > > > > > is
> > > > > > > > > > > for
> > > > > > > > > > > > > topic
> > > > > > > > > > > > > > > > > deletion
> > > > > > > > > > > > > > > > > > > > and partition reassignment, and
> causes
> > > the
> > > > > > > replica
> > > > > > > > > to be
> > > > > > > > > > > > > deleted.
> > > > > > > > > > > > > > > > > Since we
> > > > > > > > > > > > > > > > > > > > are deprecating StopReplica, could we
> > > > > document
> > > > > > > what
> > > > > > > > > > > triggers
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > stopping
> > > > > > > > > > > > > > > > > > > > of a follower and the deleting of a
> > > replica
> > > > > now?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > RemoveTopic triggers deletion.  In
> general
> > > the
> > > > > > > > > > > functionality of
> > > > > > > > > > > > > > > > > StopReplica is subsumed by the metadata
> > > records.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 102. Should we include the metadata
> > > topic in
> > > > > the
> > > > > > > > > > > > > > > MetadataResponse?
> > > > > > > > > > > > > > > > > If so,
> > > > > > > > > > > > > > > > > > > > when it will be included and what
> will
> > > the
> > > > > > > metadata
> > > > > > > > > > > response
> > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > > like?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > No, it won't be included in the
> metadata
> > > > > response
> > > > > > > sent
> > > > > > > > > back
> > > > > > > > > > > > > from
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > brokers.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 103. "The active controller assigns
> the
> > > > > broker a
> > > > > > > new
> > > > > > > > > > > broker
> > > > > > > > > > > > > > > epoch,
> > > > > > > > > > > > > > > > > based on
> > > > > > > > > > > > > > > > > > > > the latest committed offset in the
> log."
> > > This
> > > > > > > seems
> > > > > > > > > > > > > inaccurate
> > > > > > > > > > > > > > > since
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > latest committed offset doesn't
> always
> > > > > advance on
> > > > > > > > > every
> > > > > > > > > > > log
> > > > > > > > > > > > > > > append.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Given that the new broker epoch won't
> be
> > > > > visible
> > > > > > > until
> > > > > > > > > the
> > > > > > > > > > > > > commit
> > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > happened, I have changed this to "the next
> > > > > available
> > > > > > > > > offset in
> > > > > > > > > > > the
> > > > > > > > > > > > > log"
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 104. REGISTERING(1) : It says
> > > "Otherwise, the
> > > > > > > broker
> > > > > > > > > > > moves
> > > > > > > > > > > > > into
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > FENCED
> > > > > > > > > > > > > > > > > > > > state.". It seems this should be
> RUNNING?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 105. RUNNING: Should we require the
> > > broker to
> > > > > > > catch
> > > > > > > > > up
> > > > > > > > > > > to the
> > > > > > > > > > > > > > > > > metadata log
> > > > > > > > > > > > > > > > > > > > to get into this state?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > For 104 and 105, these sections have
> been
> > > > > reworked.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > best,
> > > > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Fri, Oct 23, 2020 at 1:20 PM Colin
> > > McCabe
> > > > > <
> > > > > > > > > > > > > [email protected]
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Wed, Oct 21, 2020, at 05:51, Tom
> > > Bentley
> > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > Hi Colin,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 19, 2020, at 08:59,
> Ron
> > > > > Dagostino
> > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > Hi Colin.  Thanks for the
> hard
> > > work
> > > > > on
> > > > > > > this
> > > > > > > > > KIP.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > I have some questions about
> what
> > > > > happens
> > > > > > > to a
> > > > > > > > > > > broker
> > > > > > > > > > > > > > > when it
> > > > > > > > > > > > > > > > > becomes
> > > > > > > > > > > > > > > > > > > > > > > > fenced (e.g. because it can't
> > > send a
> > > > > > > > > heartbeat
> > > > > > > > > > > > > request to
> > > > > > > > > > > > > > > > > keep its
> > > > > > > > > > > > > > > > > > > > > > > > lease).  The KIP says "When a
> > > broker
> > > > > is
> > > > > > > > > fenced,
> > > > > > > > > > > it
> > > > > > > > > > > > > cannot
> > > > > > > > > > > > > > > > > process any
> > > > > > > > > > > > > > > > > > > > > > > > client requests.  This
> prevents
> > > > > brokers
> > > > > > > which
> > > > > > > > > > > are not
> > > > > > > > > > > > > > > > > receiving
> > > > > > > > > > > > > > > > > > > > > > > > metadata updates or that are
> not
> > > > > > > receiving
> > > > > > > > > and
> > > > > > > > > > > > > processing
> > > > > > > > > > > > > > > > > them fast
> > > > > > > > > > > > > > > > > > > > > > > > enough from causing issues to
> > > > > clients."
> > > > > > > And
> > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > > description of the
> > > > > > > > > > > > > > > > > > > > > > > > FENCED(4) state it likewise
> says
> > > > > "While
> > > > > > > in
> > > > > > > > > this
> > > > > > > > > > > > > state,
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > > > does
> > > > > > > > > > > > > > > > > > > > > > > > not respond to client
> > > requests."  It
> > > > > > > makes
> > > > > > > > > sense
> > > > > > > > > > > > > that a
> > > > > > > > > > > > > > > > > fenced broker
> > > > > > > > > > > > > > > > > > > > > > > > should not accept producer
> > > requests
> > > > > -- I
> > > > > > > > > assume
> > > > > > > > > > > any
> > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > requests
> > > > > > > > > > > > > > > > > > > > > > > > would result in
> > > > > > > NotLeaderOrFollowerException.
> > > > > > > > > > > But
> > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > > about KIP-392
> > > > > > > > > > > > > > > > > > > > > > > > (fetch from follower)
> consumer
> > > > > > > requests?  It
> > > > > > > > > is
> > > > > > > > > > > > > > > conceivable
> > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > these
> > > > > > > > > > > > > > > > > > > > > > > > could continue.  Related to
> that,
> > > > > would a
> > > > > > > > > fenced
> > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > continue to
> > > > > > > > > > > > > > > > > > > > > > > > fetch data for partitions
> where
> > > it
> > > > > > > thinks it
> > > > > > > > > is a
> > > > > > > > > > > > > > > follower?
> > > > > > > > > > > > > > > > > Even if
> > > > > > > > > > > > > > > > > > > > > > > > it rejects consumer requests
> it
> > > might
> > > > > > > still
> > > > > > > > > > > continue
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > fetch as a
> > > > > > > > > > > > > > > > > > > > > > > > follower.  Might it be
> helpful to
> > > > > clarify
> > > > > > > > > both
> > > > > > > > > > > > > decisions
> > > > > > > > > > > > > > > > > here?
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Hi Ron,
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Good question.  I think a
> fenced
> > > broker
> > > > > > > should
> > > > > > > > > > > > > continue to
> > > > > > > > > > > > > > > > > fetch on
> > > > > > > > > > > > > > > > > > > > > > > partitions it was already
> fetching
> > > > > before
> > > > > > > it
> > > > > > > > > was
> > > > > > > > > > > > > fenced,
> > > > > > > > > > > > > > > > > unless it
> > > > > > > > > > > > > > > > > > > > > hits a
> > > > > > > > > > > > > > > > > > > > > > > problem.  At that point it
> won't be
> > > > > able to
> > > > > > > > > > > continue,
> > > > > > > > > > > > > > > since it
> > > > > > > > > > > > > > > > > doesn't
> > > > > > > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > > > > > the new metadata.  For
> example, it
> > > > > won't
> > > > > > > know
> > > > > > > > > about
> > > > > > > > > > > > > > > leadership
> > > > > > > > > > > > > > > > > changes
> > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > the partitions it's fetching.
> The
> > > > > > > rationale
> > > > > > > > > for
> > > > > > > > > > > > > > > continuing to
> > > > > > > > > > > > > > > > > fetch
> > > > > > > > > > > > > > > > > > > > > is to
> > > > > > > > > > > > > > > > > > > > > > > try to avoid disruptions as
> much as
> > > > > > > possible.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > I don't think fenced brokers
> should
> > > > > accept
> > > > > > > > > client
> > > > > > > > > > > > > requests.
> > > > > > > > > > > > > > > > > The issue
> > > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > that the fenced broker may or
> may
> > > not
> > > > > have
> > > > > > > any
> > > > > > > > > > > data it
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > supposed to
> > > > > > > > > > > > > > > > > > > > > > > have.  It may or may not have
> > > applied
> > > > > any
> > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > > changes, etc.
> > > > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > > > it is supposed to have applied.
> > > So it
> > > > > > > could
> > > > > > > > > get
> > > > > > > > > > > pretty
> > > > > > > > > > > > > > > > > confusing, and
> > > > > > > > > > > > > > > > > > > > > also
> > > > > > > > > > > > > > > > > > > > > > > potentially waste the client's
> > > time.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > When fenced, how would the broker
> > > reply
> > > > > to a
> > > > > > > > > client
> > > > > > > > > > > > > which did
> > > > > > > > > > > > > > > > > make a
> > > > > > > > > > > > > > > > > > > > > > request?
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Hi Tom,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > The broker will respond with a
> > > retryable
> > > > > error
> > > > > > > in
> > > > > > > > > that
> > > > > > > > > > > > > case.
> > > > > > > > > > > > > > > Once
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > client has re-fetched its
> metadata, it
> > > > > will no
> > > > > > > > > longer
> > > > > > > > > > > see
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > fenced broker
> > > > > > > > > > > > > > > > > > > > > as part of the cluster.  I added a
> > > note to
> > > > > the
> > > > > > > KIP.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > best,
> > > > > > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Tom
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

Reply via email to