Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

Jun Rao Wed, 16 Dec 2020 18:13:54 -0800

Hi, Colin,

Thanks for the reply. Just a couple of more comments.


211. Currently, the broker only registers itself in ZK after log recovery.
Is there any benefit to change that? As you mentioned, the broker can't do
much before completing log recovery.

230. Regarding MetadataResponse, there is a slight awkwardness. We return
rack for each node. However, if that node is for the controller, the rack
field is not really relevant. Should we clean it up here or in another KIP
like KIP-700?

Thanks,

Jun

On Wed, Dec 16, 2020 at 4:23 PM Colin McCabe <cmcc...@apache.org> wrote:

> On Wed, Dec 16, 2020, at 13:40, Jun Rao wrote:
> > Hi, Colin,
> >
> > Thanks for the reply. A few follow up comments.
> >
> > 211. When does the broker send the BrokerRegistration request to the
> > controller? Is it after the recovery phase? If so, at that point, the
> > broker has already caught up on the metadata (in order to clean up
> deleted
> > partitions). Then, it seems we don't need the ShouldFence field
> > in BrokerHeartbeatRequest?
>
> Hi Jun,
>
> Thanks again for the reviews.
>
> The broker sends the registration request as soon as it starts up.  It
> cannot wait until the recovery phase is over since sometimes log recovery
> can take quite a long time.
>
> >
> > 213. kafka-cluster.sh
> > 213.1 For the decommision example, should the command take a broker id?
>
> Yes, the example should have a broker id.  Fixed.
>
> > 213.2 Existing tools use the "--" command line option (e.g. kafka-topics
> > --list --topic test). Should we follow the same convention
> > for kafka-cluster.sh (and kafka-storage.sh)?
>
> Hmm.  I don't think argparse4j supports using double dashes to identify
> subcommands.  I think it might be confusing as well, since the subcommand
> must come first, unlike a plain old argument which can be anywhere on the
> command line.
>
> > 213.3 Should we add an admin api for broker decommission so that this can
> > be done programmatically?
> >
>
> Yes, there is an admin client API for decommissioning as well.
>
> > 220. DecommissionBroker: "NOT_CONTROLLER if the node that the request was
> > sent to is not the controller". I thought the clients never send a
> request
> > to the controller directly now and the broker will forward it to the
> > controller?
> >
>
> If the controller moved recently, it's possible that the broker could send
> to a controller that has just recently become inactive.  In that case
> NOT_CONTROLLER would be returned.  (A standby controller returns
> NOT_CONTROLLER for most APIs).
>
> > 221. Could we add the required ACL for the new requests?
> >
>
> Good point.  I added the required ACL for each new RPC.
>
> best,
> Colin
>
>
> > Jun
> >
> > On Wed, Dec 16, 2020 at 11:38 AM Colin McCabe <cmcc...@apache.org>
> wrote:
> >
> > > On Wed, Dec 16, 2020, at 09:59, Jun Rao wrote:
> > > > Hi, Colin,
> > > >
> > > > Thanks for the reply. A few more comments below.
> > > >
> > >
> > > Hi Jun,
> > >
> > > Thanks for the comments.
> > >
> > > > 206. "RemoveTopic is the last step, that scrubs all metadata about
> the
> > > > topic.  In order to get to that last step, the topic data needs to
> > > removed
> > > > from all brokers (after each broker notices that the topic is being
> > > > deleted)." Currently, this is done based on the response of
> > > > StopReplicaRequest. Since the controller no longer sends this
> request,
> > > how
> > > > does the controller know that the data for the deleted topic has
> > > > been removed in the brokers?
> > > >
> > >
> > > That's a good point... definitely an oversight.
> > >
> > > It seems complex to force the controller to track when log directories
> > > have been deleted.  Let's just assume that KIP-516 has been
> implemented,
> > > and track them by UUID.  Then we can just have a single topic deletion
> > > record.
> > >
> > > I added a section on "topic identifiers" describing this.
> > >
> > > > 210. Thanks for the explanation. Sounds good to me.
> > > >
> > > > 211. I still don't see an example when ShouldFence is set to true in
> > > > BrokerHeartbeatReques. Could we add one?
> > > >
> > >
> > > It is sent to true when the broker is first starting up and doesn't yet
> > > want to be unfenced.  I added a longer explanation of this in the
> "Broker
> > > Leases" section.
> > >
> > > > 213. The KIP now allows replicas to be assigned on brokers that are
> > > fenced,
> > > > which is an improvement. How do we permanently remove a broker (e.g.
> > > > cluster shrinking) to prevent it from being used for future replica
> > > > assignments?
> > > >
> > >
> > > This is a fair point.  I will create a kafka-cluster.sh script which
> can
> > > do this, plus a DecommissionBrokerRequest.
> > >
> > > As a bonus the kafka-cluster.sh script can help people find the
> cluster ID
> > > of brokers, something that people have wanted a tool for in the past.
> > >
> > > > 214. "Currently, when a node is down, all of its ZK registration
> > > > information is gone.  But  we need this information in order to
> > > understand
> > > > things like whether the replicas of a particular partition are
> > > > well-balanced across racks." This is not quite true. Currently, even
> when
> > > > ZK registration is gone, the existing replica assignment is still
> > > available
> > > > in the metadata response.
> > > >
> > >
> > > I agree that the existing replica assignment is still available.  But
> that
> > > just tells you partition X is on nodes A, B, and C.  If you don't have
> the
> > > ZK registration for one or more of A, B, or C then you don't know
> whether
> > > we are following the policy of "two replicas on one rack, one replica
> on
> > > another."  Or any other more complex rack placement policy that you
> might
> > > have.
> > >
> > > best,
> > > Colin
> > >
> > > > Jun
> > > >
> > > > On Wed, Dec 16, 2020 at 8:48 AM Colin McCabe <cmcc...@apache.org>
> wrote:
> > > >
> > > > > On Tue, Dec 15, 2020, at 13:08, Jun Rao wrote:
> > > > > > Hi, Colin,
> > > > > >
> > > > > > Thanks for the reply. A few more follow up comments.
> > > > > >
> > > > > > 210. initial.broker.registration.timeout.ms: The default value
> is
> > > 90sec,
> > > > > > which seems long. If a broker fails the registration because of
> > > incorrect
> > > > > > configs, we want to fail faster. In comparison, the defaults for
> > > > > > zookeeper.connection.timeout.ms is 18 secs.
> > > > > >
> > > > >
> > > > > Hi Jun,
> > > > >
> > > > > I agree that the initial connection timeout here is longer than
> what we
> > > > > had with ZK.  The main reason I selected a slightly longer timeout
> > > here is
> > > > > to handle the case where the controllers and the brokers are
> > > co-located.
> > > > > For example, if you have a 3 node cluster and all three nodes are
> > > > > controllers+brokers, when you first bring up the cluster, we will
> have
> > > to
> > > > > stand up the controller quorum and then handle broker
> registrations.
> > > > > Although we believe Raft will start up relatively quickly, it's
> good to
> > > > > leave some extra margin here.
> > > > >
> > > > > I don't think there's a big disadvantage to having a slightly
> longer
> > > > > timeout here.  After all, starting up the brokers with no
> controllers
> > > is an
> > > > > admin mistake, which we don't expect to see very often.  Maybe
> let's
> > > set it
> > > > > to 60 seconds for now and maybe see if we can tweak it in the
> future if
> > > > > that turns out to be too long or short?
> > > > >
> > > > > > 211. "Once they are all moved, the controller responds to the
> > > heartbeat
> > > > > > with a nextState of SHUTDOWN." It seems that nextState is no
> longer
> > > > > > relevant. Also, could you add a bit more explanation on when
> > > ShouldFence
> > > > > is
> > > > > > set to true in BrokerHeartbeatRequest?
> > > > > >
> > > > >
> > > > > Good catch.  I removed the obsolete section referring to nextState
> and
> > > > > added a reference to the new boolean.  I also added some more
> > > information
> > > > > about ShouldFence and about the rationale for separating fencing
> from
> > > > > registration.
> > > > >
> > > > > > 212. Related to the above, what does the broker do if IsFenced is
> > > true in
> > > > > > the BrokerHeartbeatResponse? Will the broker do the same thing on
> > > > > receiving
> > > > > > a FenceBrokerRecord from the metadata log?
> > > > > >
> > > > >
> > > > > The broker only checks this boolean during the startup process.
> After
> > > the
> > > > > startup process is finished, it ignores this boolean.
> > > > >
> > > > > The broker uses fence / unfence records in the metadata log to
> > > determine
> > > > > which brokers should appear in its MetadataResponse.
> > > > >
> > > > > best,
> > > > > Colin
> > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Tue, Dec 15, 2020 at 8:51 AM Colin McCabe <cmcc...@apache.org
> >
> > > wrote:
> > > > > >
> > > > > > > On Tue, Dec 15, 2020, at 04:13, Tom Bentley wrote:
> > > > > > > > Hi Colin,
> > > > > > > >
> > > > > > > > The KIP says that "brokers which are fenced will not appear
> in
> > > > > > > > MetadataResponses.  So clients that have up-to-date metadata
> will
> > > > > not try
> > > > > > > > to contact fenced brokers.", which is fine, but it doesn't
> seem
> > > to
> > > > > > > > elaborate on what happens for clients which try to contact a
> > > broker
> > > > > > > (using
> > > > > > > > stale metadata, for example) and find it in the
> > > registered-but-fenced
> > > > > > > > state.
> > > > > > >
> > > > > > > Hi Tom,
> > > > > > >
> > > > > > > I have dropped the broker-side fencing from this proposal.  So
> now
> > > the
> > > > > > > fencing is basically the same as today's fencing: it's
> > > controller-side
> > > > > > > only.  That means that clients using stale metadata can contact
> > > fenced
> > > > > > > brokers and communicate with them.  The only case where the
> broker
> > > > > will be
> > > > > > > inaccessible is when it hasn't finished starting yet (i.e., has
> > > not yet
> > > > > > > attained RUNNING state.)
> > > > > > >
> > > > > > > Just like today, we expect the safeguards built into the
> > > replication
> > > > > > > protocol to prevent the worst corner cases that could result
> from
> > > > > this.  I
> > > > > > > do think we will probably take up this issue later and find a
> > > better
> > > > > > > solution for client-side metadata, but this KIP is big enough
> > > as-is.
> > > > > > >
> > > > > > > best,
> > > > > > > Colin
> > > > > > >
> > > > > > >
> > > > > > > > You said previously that the broker will respond with a
> retriable
> > > > > > > > error, which is again fine, but you didn't answer my question
> > > about
> > > > > which
> > > > > > > > error code you've chosen for this. I realise that this
> doesn't
> > > really
> > > > > > > > affect the correctness of the behaviour, but I'm not aware
> of any
> > > > > > > existing
> > > > > > > > error code which is a good fit. So it would be good to
> understand
> > > > > about
> > > > > > > how
> > > > > > > > you're making this backward compatible for clients.
> > > > > > > >
> > > > > > > > Many thanks,
> > > > > > > >
> > > > > > > > Tom
> > > > > > > >
> > > > > > > > On Tue, Dec 15, 2020 at 1:42 AM Colin McCabe <
> cmcc...@apache.org
> > > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > On Fri, Dec 11, 2020, at 17:07, Jun Rao wrote:
> > > > > > > > > > Hi, Colin,
> > > > > > > > > >
> > > > > > > > > > Thanks for the reply. Just a couple of more comments
> below.
> > > > > > > > > >
> > > > > > > > > > 210. Since we are deprecating
> > > zookeeper.connection.timeout.ms,
> > > > > > > should we
> > > > > > > > > > add a new config to bound the time for a broker to
> connect
> > > to the
> > > > > > > > > > controller during starting up?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Good idea.  I added initial.broker.registration.timeout.ms
> for
> > > > > this.
> > > > > > > > >
> > > > > > > > > > 211. BrokerHeartbeat no longer has the state field in the
> > > > > > > > > request/response.
> > > > > > > > > > However, (a) the controller shutdown section still has
> "In
> > > its
> > > > > > > periodic
> > > > > > > > > > heartbeats, the broker asks the controller if it can
> > > transition
> > > > > into
> > > > > > > the
> > > > > > > > > > SHUTDOWN state.  This motivates the controller to move
> all
> > > of the
> > > > > > > leaders
> > > > > > > > > > off of that broker.  Once they are all moved, the
> controller
> > > > > > > responds to
> > > > > > > > > > the heartbeat with a nextState of SHUTDOWN."; (2) the
> > > > > description of
> > > > > > > > > > BrokerHeartbeat still references currentState and
> > > targetState.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks.  I've made these sections clearer and removed the
> > > obsolete
> > > > > > > > > references to sending states.
> > > > > > > > >
> > > > > > > > > best,
> > > > > > > > > Colin
> > > > > > > > >
> > > > > > > > > > Jun
> > > > > > > > > >
> > > > > > > > > > On Fri, Dec 11, 2020 at 1:33 PM Colin McCabe <
> > > cmcc...@apache.org
> > > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > On Wed, Dec 9, 2020, at 10:10, Jun Rao wrote:
> > > > > > > > > > > > Hi, Colin,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the update. A few more follow up comments.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Hi Jun,
> > > > > > > > > > >
> > > > > > > > > > > Thanks again for the review.
> > > > > > > > > > >
> > > > > > > > > > > > 100. FailedReplicaRecord: Since this is reported by
> each
> > > > > broker
> > > > > > > > > > > > independently, perhaps we could use a more concise
> > > > > representation
> > > > > > > > > that
> > > > > > > > > > > has
> > > > > > > > > > > > a top level broker field, an array of topics, which
> has
> > > an
> > > > > array
> > > > > > > of
> > > > > > > > > > > > partitions.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > The issue is that there is a size limit on the each
> record.
> > > > > > > Putting
> > > > > > > > > all
> > > > > > > > > > > of the partitions of a log directory into a single
> record
> > > would
> > > > > > > > > probably
> > > > > > > > > > > break that in many cases.  Still, we can optimize a
> bit by
> > > > > having
> > > > > > > an
> > > > > > > > > array
> > > > > > > > > > > of partition IDs, since nearly all the time, we have
> more
> > > than
> > > > > one
> > > > > > > > > from the
> > > > > > > > > > > same topic.
> > > > > > > > > > >
> > > > > > > > > > > > 200. Sounds good. If we remove the broker-side
> fencing
> > > > > logic, do
> > > > > > > we
> > > > > > > > > plan
> > > > > > > > > > > to
> > > > > > > > > > > > still keep FENCED in broker state? Do we plan to
> expose
> > > the
> > > > > new
> > > > > > > > > states
> > > > > > > > > > > > through the existing BrokerState metric and if so,
> what
> > > are
> > > > > the
> > > > > > > > > values
> > > > > > > > > > > for
> > > > > > > > > > > > the new states?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > No, we don't need FENCED any more.  I have removed it
> from
> > > the
> > > > > KIP.
> > > > > > > > > > >
> > > > > > > > > > > The new states are very similar to the current ones,
> > > actually.
> > > > > > > There
> > > > > > > > > are
> > > > > > > > > > > no new states or removed ones.  The main change in the
> > > broker
> > > > > state
> > > > > > > > > machine
> > > > > > > > > > > is that the RECOVERING_FROM_UNCLEAN_SHUTDOWN state has
> been
> > > > > > > renamed to
> > > > > > > > > > > RECOVERY.  Also, unlike previously, the broker will
> always
> > > pass
> > > > > > > through
> > > > > > > > > > > RECOVERY (although it may only stay in this state for a
> > > very
> > > > > short
> > > > > > > > > amount
> > > > > > > > > > > of time).
> > > > > > > > > > >
> > > > > > > > > > > > 201. This may be fine too. Could we document what
> happens
> > > > > when
> > > > > > > the
> > > > > > > > > > > > broker.id/controller.id in metadata.properties don't
> > > match
> > > > > the
> > > > > > > > > broker
> > > > > > > > > > > > config when the broker starts up?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I added some documentation about this.
> > > > > > > > > > >
> > > > > > > > > > > > 204. There is still "The highest metadata offset
> which
> > > the
> > > > > broker
> > > > > > > > > has not
> > > > > > > > > > > > reached" referenced under BrokerRegistration.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > It should be CurrentMetadataOffset.  Fixed.
> > > > > > > > > > >
> > > > > > > > > > > > 206. Is that separate step needed given KIP-516? With
> > > > > KIP-516 (
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers#KIP516:TopicIdentifiers-LeaderAndIsr
> > > > > > > > > > > ),
> > > > > > > > > > > > we don't need to wait for the topic data to be
> removed
> > > from
> > > > > all
> > > > > > > > > brokers
> > > > > > > > > > > > before removing the topic metadata. The combination
> of
> > > > > unmatching
> > > > > > > > > > > > topicId
> > > > > > > > > > > > or the missing topicId from the metadata is enough
> for
> > > the
> > > > > > > broker to
> > > > > > > > > > > > clean
> > > > > > > > > > > > up deleted topics asynchronously.
> > > > > > > > > > >
> > > > > > > > > > > It won't be needed once KIP-516 is adopted, but this
> hasn't
> > > > > been
> > > > > > > > > > > implemented yet.
> > > > > > > > > > >
> > > > > > > > > > > best,
> > > > > > > > > > > Colin
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Jun
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Dec 8, 2020 at 5:27 PM Colin McCabe <
> > > > > cmcc...@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Dec 3, 2020, at 16:37, Jun Rao wrote:
> > > > > > > > > > > > > > Hi, Colin,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the updated KIP. A few more comments
> > > below.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi Jun,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks again for the reviews.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 80.2 For deprecated configs, we need to include
> > > > > zookeeper.*
> > > > > > > and
> > > > > > > > > > > > > > broker.id.generation.enable.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Added.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 83.1 If a broker is down, does the controller
> keep
> > > the
> > > > > > > previously
> > > > > > > > > > > > > > registered broker epoch forever? If not, how long
> > > does
> > > > > the
> > > > > > > > > controller
> > > > > > > > > > > > > keep
> > > > > > > > > > > > > > it? What does the controller do when receiving a
> > > broker
> > > > > > > heartbeat
> > > > > > > > > > > request
> > > > > > > > > > > > > > with an unfound broker epoch?
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, the controller keeps the previous registration
> > > > > forever.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Broker heartbeat requests with an incorrect broker
> > > epoch
> > > > > will
> > > > > > > be
> > > > > > > > > > > rejected
> > > > > > > > > > > > > with STALE_BROKER_EPOCH.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 100. Have you figured out if we need to add a new
> > > record
> > > > > > > type for
> > > > > > > > > > > > > reporting
> > > > > > > > > > > > > > partitions on failed disks?
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I added FailedReplicaRecord to reflect the case
> where a
> > > > > JBOD
> > > > > > > > > directory
> > > > > > > > > > > has
> > > > > > > > > > > > > failed, leading to failed replicas.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 102. For debugging purposes, sometimes it's
> useful to
> > > > > read
> > > > > > > the
> > > > > > > > > > > metadata
> > > > > > > > > > > > > > topic using tools like console-consumer. Should
> we
> > > > > support
> > > > > > > that
> > > > > > > > > and
> > > > > > > > > > > if
> > > > > > > > > > > > > so,
> > > > > > > > > > > > > > how?
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > For now, we have the ability to read the metadata
> logs
> > > > > with the
> > > > > > > > > > > dump-logs
> > > > > > > > > > > > > tool.  I think we will come up with some other
> tools
> > > in the
> > > > > > > future
> > > > > > > > > as
> > > > > > > > > > > we
> > > > > > > > > > > > > get experience.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 200. "brokers which are fenced will not appear in
> > > > > > > > > MetadataResponses.
> > > > > > > > > > > The
> > > > > > > > > > > > > > broker will not respond to these requests--
> instead,
> > > it
> > > > > will
> > > > > > > > > simply
> > > > > > > > > > > > > > disconnect." If the controller is partitioned off
> > > from
> > > > > the
> > > > > > > > > brokers,
> > > > > > > > > > > this
> > > > > > > > > > > > > > design will cause every broker to stop accepting
> new
> > > > > client
> > > > > > > > > > > requests. In
> > > > > > > > > > > > > > contrast, if ZK is partitioned off, the existing
> > > > > behavior is
> > > > > > > > > that the
> > > > > > > > > > > > > > brokers can continue to work based on the last
> known
> > > > > > > metadata.
> > > > > > > > > So, I
> > > > > > > > > > > am
> > > > > > > > > > > > > not
> > > > > > > > > > > > > > sure if we should change the existing behavior
> > > because
> > > > > of the
> > > > > > > > > bigger
> > > > > > > > > > > > > impact
> > > > > > > > > > > > > > in the new one. Another option is to keep the
> > > existing
> > > > > > > behavior
> > > > > > > > > and
> > > > > > > > > > > > > expose
> > > > > > > > > > > > > > a metric for fenced brokers so that the operator
> > > could be
> > > > > > > > > alerted.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm skeptical about how well running without ZK
> > > currently
> > > > > > > works.
> > > > > > > > > > > However,
> > > > > > > > > > > > > I will move the broker-side fencing into a
> follow-up
> > > KIP.
> > > > > This
> > > > > > > > > KIP is
> > > > > > > > > > > > > already pretty large and there is no hard
> dependency on
> > > > > this.
> > > > > > > > > There
> > > > > > > > > > > may
> > > > > > > > > > > > > also be other ways of accomplishing the positive
> > > effects of
> > > > > > > what
> > > > > > > > > > > > > broker-side fencing, so more discussion is needed.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 201. I read Ron's comment, but I am still not
> sure
> > > the
> > > > > > > benefit of
> > > > > > > > > > > keeping
> > > > > > > > > > > > > > broker.id and controller.id in meta.properties.
> It
> > > seems
> > > > > > > that
> > > > > > > > > we are
> > > > > > > > > > > > > just
> > > > > > > > > > > > > > duplicating the same info in two places and have
> the
> > > > > > > additional
> > > > > > > > > > > burden of
> > > > > > > > > > > > > > making sure the values in the two places are
> > > consistent.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think the reasoning is that having broker.id
> > > protects us
> > > > > > > against
> > > > > > > > > > > > > accidentally bringing up a broker with a disk from
> a
> > > > > different
> > > > > > > > > > > broker.  I
> > > > > > > > > > > > > don't feel strongly about this but it seemed
> simpler to
> > > > > keep
> > > > > > > it.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 202. controller.connect.security.protocol: Is
> this
> > > needed
> > > > > > > since
> > > > > > > > > > > > > > controller.listener.names and
> > > > > listener.security.protocol.map
> > > > > > > > > imply
> > > > > > > > > > > the
> > > > > > > > > > > > > > security protocol already?
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > You're right, this isn't needed.  I'll remove it.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 203. registration.heartbeat.interval.ms: It
> > > defaults to
> > > > > 2k.
> > > > > > > ZK
> > > > > > > > > uses
> > > > > > > > > > > 1/3
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > the session timeout for heartbeat. So, given the
> > > default
> > > > > 18k
> > > > > > > for
> > > > > > > > > > > > > > registration.lease.timeout.ms, should we default
> > > > > > > > > > > > > > registration.heartbeat.interval.ms to 6k?
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 6 seconds seems like a pretty long time between
> > > > > heartbeats.  It
> > > > > > > > > might
> > > > > > > > > > > be
> > > > > > > > > > > > > useful to know when a broker is missing heartbeats,
> > > with
> > > > > less
> > > > > > > time
> > > > > > > > > than
> > > > > > > > > > > > > that.  I provisionally set it to 3 seconds (we can
> > > always
> > > > > > > change
> > > > > > > > > > > later...)
> > > > > > > > > > > > >
> > > > > > > > > > > > > I also changed the name of these configurations to
> "
> > > > > > > > > > > > > broker.heartbeat.interval.ms" and "
> > > > > > > broker.registration.timeout.ms"
> > > > > > > > > to
> > > > > > > > > > > try
> > > > > > > > > > > > > to clarify them a bit.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 204. "The highest metadata offset which the
> broker
> > > has
> > > > > not
> > > > > > > > > reached."
> > > > > > > > > > > It
> > > > > > > > > > > > > > seems this should be "has reached".
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I changed this to "one more than the highest
> metadata
> > > > > offset
> > > > > > > which
> > > > > > > > > the
> > > > > > > > > > > > > broker has reached."
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 205. UnfenceBrokerRecord and
> UnregisterBrokerRecord:
> > > To
> > > > > me,
> > > > > > > they
> > > > > > > > > > > seem to
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > the same. Do we need both?
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Unregistration means that the broker has been
> removed
> > > from
> > > > > the
> > > > > > > > > cluster.
> > > > > > > > > > > > > That is different than unfencing, which marks the
> > > broker as
> > > > > > > active.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 206. TopicRecord: The Deleting field is used to
> > > indicate
> > > > > > > that the
> > > > > > > > > > > topic
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > being deleted. I am wondering if this is really
> > > needed
> > > > > since
> > > > > > > > > > > RemoveTopic
> > > > > > > > > > > > > > already indicates the same thing.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > RemoveTopic is the last step, that scrubs all
> metadata
> > > > > about
> > > > > > > the
> > > > > > > > > topic.
> > > > > > > > > > > > > In order to get to that last step, the topic data
> > > needs to
> > > > > > > removed
> > > > > > > > > > > from all
> > > > > > > > > > > > > brokers (after each broker notices that the topic
> is
> > > being
> > > > > > > > > deleted).
> > > > > > > > > > > > >
> > > > > > > > > > > > > best,
> > > > > > > > > > > > > Colin
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Dec 2, 2020 at 2:50 PM Colin McCabe <
> > > > > > > cmcc...@apache.org>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Dec 2, 2020, at 14:07, Ron Dagostino
> wrote:
> > > > > > > > > > > > > > > > Hi Colin.  Thanks for the updates.  It's now
> > > clear
> > > > > to me
> > > > > > > that
> > > > > > > > > > > brokers
> > > > > > > > > > > > > > > > keep their broker epoch for the life of their
> > > JVM --
> > > > > they
> > > > > > > > > > > register
> > > > > > > > > > > > > > > > once, get their broker epoch in the
> response, and
> > > > > then
> > > > > > > never
> > > > > > > > > > > > > > > > re-register again.  Brokers may get fenced,
> but
> > > they
> > > > > > > keep the
> > > > > > > > > > > same
> > > > > > > > > > > > > > > > broker epoch for the life of their JVM.  The
> > > > > incarnation
> > > > > > > ID
> > > > > > > > > is
> > > > > > > > > > > also
> > > > > > > > > > > > > > > > kept for the life of the JVM but is
> generated by
> > > the
> > > > > > > broker
> > > > > > > > > > > itself
> > > > > > > > > > > > > > > > upon startup, and the combination of the two
> > > allows
> > > > > the
> > > > > > > > > > > Controller to
> > > > > > > > > > > > > > > > act idempotently if any previously-sent
> > > registration
> > > > > > > response
> > > > > > > > > > > gets
> > > > > > > > > > > > > > > > lost.  Makes sense.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks, Ron.  That's a good summary.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > One thing I wonder about is if it might be
> > > helpful
> > > > > for
> > > > > > > the
> > > > > > > > > > > broker to
> > > > > > > > > > > > > > > > send the Cluster ID as determined from its
> > > > > > > meta.properties
> > > > > > > > > file
> > > > > > > > > > > in
> > > > > > > > > > > > > its
> > > > > > > > > > > > > > > > registration request.  Does it even make
> sense
> > > for
> > > > > the
> > > > > > > > > broker to
> > > > > > > > > > > > > > > > successfully register and enter the Fenced
> state
> > > if
> > > > > it
> > > > > > > has
> > > > > > > > > the
> > > > > > > > > > > wrong
> > > > > > > > > > > > > > > > Cluster ID?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yeah, that's a good idea.  Let's have the
> broker
> > > pass
> > > > > its
> > > > > > > > > cluster
> > > > > > > > > > > ID in
> > > > > > > > > > > > > > > the registration RPC, and then registration can
> > > fail
> > > > > if the
> > > > > > > > > broker
> > > > > > > > > > > is
> > > > > > > > > > > > > > > configured for the wrong cluster.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >  The nextMetadatOffset value that the broker
> > > > > communicates
> > > > > > > > > > > > > > > > in its registration request only has meaning
> > > within
> > > > > the
> > > > > > > > > correct
> > > > > > > > > > > > > > > > cluster, so it feels to me that the
> Controller
> > > should
> > > > > > > have
> > > > > > > > > some
> > > > > > > > > > > way
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > perform this sanity check.  There is
> currently
> > > > > (pre-KIP
> > > > > > > 500)
> > > > > > > > > a
> > > > > > > > > > > check
> > > > > > > > > > > > > > > > in the broker to make sure its configured
> > > cluster ID
> > > > > > > matches
> > > > > > > > > the
> > > > > > > > > > > one
> > > > > > > > > > > > > > > > stored in ZooKeeper, and we will have to
> perform
> > > this
> > > > > > > > > validation
> > > > > > > > > > > > > > > > somewhere in the KIP-500 world.  If the
> > > Controller
> > > > > > > doesn't
> > > > > > > > > do it
> > > > > > > > > > > > > > > > within the registration request then the
> broker
> > > will
> > > > > > > have to
> > > > > > > > > > > make a
> > > > > > > > > > > > > > > > metadata request to the Controller, retrieve
> the
> > > > > Cluster
> > > > > > > ID,
> > > > > > > > > and
> > > > > > > > > > > > > > > > perform the check itself.  It feels to me
> that it
> > > > > might
> > > > > > > be
> > > > > > > > > > > better for
> > > > > > > > > > > > > > > > the Controller to just do it, and then the
> broker
> > > > > doesn't
> > > > > > > > > have to
> > > > > > > > > > > > > > > > worry about it anymore once it successfully
> > > > > registers.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I also have a question about the broker.id
> > > value and
> > > > > > > > > > > > > meta.properties.
> > > > > > > > > > > > > > > > The KIP now says "In version 0 of
> > > meta.properties,
> > > > > there
> > > > > > > is a
> > > > > > > > > > > > > > > > broker.id field.  Version 1 does not have
> this
> > > > > field.
> > > > > > > It
> > > > > > > > > is no
> > > > > > > > > > > > > longer
> > > > > > > > > > > > > > > > needed because we no longer support dynamic
> > > broker id
> > > > > > > > > > > assignment."
> > > > > > > > > > > > > > > > But then there is an example version 1
> > > > > meta.properties
> > > > > > > file
> > > > > > > > > that
> > > > > > > > > > > > > shows
> > > > > > > > > > > > > > > > the broker.id value.  I actually wonder if
> > > maybe the
> > > > > > > > > broker.id
> > > > > > > > > > > value
> > > > > > > > > > > > > > > > would be good to keep in the version 1
> > > > > meta.properties
> > > > > > > file
> > > > > > > > > > > because
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > currently (pre-KIP 500, version 0) acts as a
> > > sanity
> > > > > > > check to
> > > > > > > > > make
> > > > > > > > > > > > > sure
> > > > > > > > > > > > > > > > the broker is using the correct log
> directory.
> > > > > Similarly
> > > > > > > > > with
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > controller.id value on controllers -- it
> would
> > > > > allow the
> > > > > > > > > same
> > > > > > > > > > > type
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > sanity check for quorum controllers.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > That's a good point.  I will add broker.id
> back,
> > > and
> > > > > also
> > > > > > > add
> > > > > > > > > > > > > > > controller.id as a possibility.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > cheers,
> > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Mon, Nov 30, 2020 at 7:41 PM Colin McCabe
> <
> > > > > > > > > cmcc...@apache.org
> > > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Fri, Oct 23, 2020, at 16:10, Jun Rao
> wrote:
> > > > > > > > > > > > > > > > > > Hi, Colin,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks for the reply. A few more
> comments.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi Jun,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks again for the reply.  Sorry for the
> long
> > > > > > > hiatus.  I
> > > > > > > > > was
> > > > > > > > > > > on
> > > > > > > > > > > > > > > vacation for a while.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 55. There is still text that favors new
> > > broker
> > > > > > > > > registration.
> > > > > > > > > > > > > "When a
> > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > first starts up, when it is in the
> INITIAL
> > > > > state, it
> > > > > > > will
> > > > > > > > > > > always
> > > > > > > > > > > > > > > "win"
> > > > > > > > > > > > > > > > > > broker ID conflicts.  However, once it is
> > > > > granted a
> > > > > > > > > lease, it
> > > > > > > > > > > > > > > transitions
> > > > > > > > > > > > > > > > > > out of the INITIAL state.  Thereafter,
> it may
> > > > > lose
> > > > > > > > > subsequent
> > > > > > > > > > > > > > > conflicts if
> > > > > > > > > > > > > > > > > > its broker epoch is stale.  (See KIP-380
> for
> > > some
> > > > > > > > > background
> > > > > > > > > > > on
> > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > epoch.)  The reason for favoring new
> > > processes
> > > > > is to
> > > > > > > > > > > accommodate
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > common
> > > > > > > > > > > > > > > > > > case where a process is killed with kill
> -9
> > > and
> > > > > then
> > > > > > > > > > > restarted.
> > > > > > > > > > > > > We
> > > > > > > > > > > > > > > want it
> > > > > > > > > > > > > > > > > > to be able to reclaim its old ID quickly
> in
> > > this
> > > > > > > case."
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for the reminder.  I have clarified
> the
> > > > > language
> > > > > > > > > here.
> > > > > > > > > > > > > > > Hopefully now it is clear that we don't allow
> quick
> > > > > re-use
> > > > > > > of
> > > > > > > > > > > broker
> > > > > > > > > > > > > IDs.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 80.1 Sounds good. Could you document that
> > > > > listeners
> > > > > > > is a
> > > > > > > > > > > required
> > > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > now? It would also be useful to annotate
> > > other
> > > > > > > required
> > > > > > > > > > > configs.
> > > > > > > > > > > > > For
> > > > > > > > > > > > > > > > > > example, controller.connect should be
> > > required.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I added a note specifying that these are
> > > required.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 80.2 Could you list all deprecated
> existing
> > > > > configs?
> > > > > > > > > Another
> > > > > > > > > > > one
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > control.plane.listener.name since the
> > > > > controller no
> > > > > > > > > longer
> > > > > > > > > > > sends
> > > > > > > > > > > > > > > > > > LeaderAndIsr, UpdateMetadata and
> StopReplica
> > > > > > > requests.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I added a section specifying some
> deprecated
> > > > > configs.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 83.1 It seems that the broker can
> transition
> > > from
> > > > > > > FENCED
> > > > > > > > > to
> > > > > > > > > > > > > RUNNING
> > > > > > > > > > > > > > > without
> > > > > > > > > > > > > > > > > > registering for a new broker epoch. I am
> not
> > > > > sure how
> > > > > > > > > this
> > > > > > > > > > > works.
> > > > > > > > > > > > > > > Once the
> > > > > > > > > > > > > > > > > > controller fences a broker, there is no
> need
> > > for
> > > > > the
> > > > > > > > > > > controller
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > keep the
> > > > > > > > > > > > > > > > > > boker epoch around. So, if the fenced
> > > broker's
> > > > > > > heartbeat
> > > > > > > > > > > request
> > > > > > > > > > > > > > > with the
> > > > > > > > > > > > > > > > > > existing broker epoch will be rejected,
> > > leading
> > > > > the
> > > > > > > > > broker
> > > > > > > > > > > back
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > FENCED state again.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The broker epoch refers to the broker
> > > registration.
> > > > > > > So we
> > > > > > > > > DO
> > > > > > > > > > > keep
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > broker epoch around even while the broker is
> > > fenced.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The broker epoch changes only when there
> is a
> > > new
> > > > > > > broker
> > > > > > > > > > > > > > > registration.  Fencing or unfencing the broker
> > > doesn't
> > > > > > > change
> > > > > > > > > the
> > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > epoch.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 83.5 Good point on KIP-590. Then should
> we
> > > > > expose the
> > > > > > > > > > > controller
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > debugging purposes? If not, we should
> > > deprecate
> > > > > the
> > > > > > > > > > > controllerID
> > > > > > > > > > > > > > > field in
> > > > > > > > > > > > > > > > > > MetadataResponse?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I think it's OK to expose it for now, with
> the
> > > > > proviso
> > > > > > > > > that it
> > > > > > > > > > > > > won't
> > > > > > > > > > > > > > > be reachable by clients.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 90. We rejected the shared ID with just
> one
> > > > > reason
> > > > > > > "This
> > > > > > > > > is
> > > > > > > > > > > not a
> > > > > > > > > > > > > > > good idea
> > > > > > > > > > > > > > > > > > because NetworkClient assumes a single ID
> > > > > space.  So
> > > > > > > if
> > > > > > > > > > > there is
> > > > > > > > > > > > > > > both a
> > > > > > > > > > > > > > > > > > controller 1 and a broker 1, we don't
> have a
> > > way
> > > > > of
> > > > > > > > > picking
> > > > > > > > > > > the
> > > > > > > > > > > > > > > "right"
> > > > > > > > > > > > > > > > > > one." This doesn't seem to be a strong
> > > reason.
> > > > > For
> > > > > > > > > example,
> > > > > > > > > > > we
> > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > address the NetworkClient issue with the
> node
> > > > > type
> > > > > > > as you
> > > > > > > > > > > pointed
> > > > > > > > > > > > > > > out or
> > > > > > > > > > > > > > > > > > using the negative value of a broker ID
> as
> > > the
> > > > > > > > > controller ID.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > It would require a lot of code changes to
> > > support
> > > > > > > multiple
> > > > > > > > > > > types of
> > > > > > > > > > > > > > > node IDs.  It's not clear to me that the end
> result
> > > > > would
> > > > > > > be
> > > > > > > > > > > better --
> > > > > > > > > > > > > I
> > > > > > > > > > > > > > > tend to think it would be worse, since it
> would be
> > > more
> > > > > > > > > complex.
> > > > > > > > > > > In a
> > > > > > > > > > > > > > > similar vein, using negative numbers seems
> > > dangerous,
> > > > > > > since we
> > > > > > > > > use
> > > > > > > > > > > > > > > negatives or -1 as "special values" in many
> places.
> > > > > For
> > > > > > > > > example,
> > > > > > > > > > > -1
> > > > > > > > > > > > > often
> > > > > > > > > > > > > > > represents "no such node."
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > One important thing to keep in mind is
> that we
> > > > > want to
> > > > > > > be
> > > > > > > > > able
> > > > > > > > > > > to
> > > > > > > > > > > > > > > transition from a broker and a controller being
> > > > > co-located
> > > > > > > to
> > > > > > > > > them
> > > > > > > > > > > no
> > > > > > > > > > > > > > > longer being co-located.  This is much easier
> to do
> > > > > when
> > > > > > > they
> > > > > > > > > have
> > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > IDs.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 100. In KIP-589
> > > > > > > > > > > > > > > > > > <
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller
> > > > > > > > > > > > > > > >,
> > > > > > > > > > > > > > > > > > the broker reports all offline replicas
> due
> > > to a
> > > > > disk
> > > > > > > > > > > failure to
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > controller. It seems this information
> needs
> > > to be
> > > > > > > > > persisted
> > > > > > > > > > > to
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > > log. Do we have a corresponding record
> for
> > > that?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hmm, I have to look into this a little bit
> > > more.
> > > > > We
> > > > > > > may
> > > > > > > > > need
> > > > > > > > > > > a new
> > > > > > > > > > > > > > > record type.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 101. Currently, StopReplica request has 2
> > > modes,
> > > > > > > without
> > > > > > > > > > > deletion
> > > > > > > > > > > > > > > and with
> > > > > > > > > > > > > > > > > > deletion. The former is used for
> controlled
> > > > > shutdown
> > > > > > > and
> > > > > > > > > > > handling
> > > > > > > > > > > > > > > disk
> > > > > > > > > > > > > > > > > > failure, and causes the follower to
> stop. The
> > > > > latter
> > > > > > > is
> > > > > > > > > for
> > > > > > > > > > > topic
> > > > > > > > > > > > > > > deletion
> > > > > > > > > > > > > > > > > > and partition reassignment, and causes
> the
> > > > > replica
> > > > > > > to be
> > > > > > > > > > > deleted.
> > > > > > > > > > > > > > > Since we
> > > > > > > > > > > > > > > > > > are deprecating StopReplica, could we
> > > document
> > > > > what
> > > > > > > > > triggers
> > > > > > > > > > > the
> > > > > > > > > > > > > > > stopping
> > > > > > > > > > > > > > > > > > of a follower and the deleting of a
> replica
> > > now?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > RemoveTopic triggers deletion.  In general
> the
> > > > > > > > > functionality of
> > > > > > > > > > > > > > > StopReplica is subsumed by the metadata
> records.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 102. Should we include the metadata
> topic in
> > > the
> > > > > > > > > > > > > MetadataResponse?
> > > > > > > > > > > > > > > If so,
> > > > > > > > > > > > > > > > > > when it will be included and what will
> the
> > > > > metadata
> > > > > > > > > response
> > > > > > > > > > > look
> > > > > > > > > > > > > > > like?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > No, it won't be included in the metadata
> > > response
> > > > > sent
> > > > > > > back
> > > > > > > > > > > from
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > brokers.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 103. "The active controller assigns the
> > > broker a
> > > > > new
> > > > > > > > > broker
> > > > > > > > > > > > > epoch,
> > > > > > > > > > > > > > > based on
> > > > > > > > > > > > > > > > > > the latest committed offset in the log."
> This
> > > > > seems
> > > > > > > > > > > inaccurate
> > > > > > > > > > > > > since
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > latest committed offset doesn't always
> > > advance on
> > > > > > > every
> > > > > > > > > log
> > > > > > > > > > > > > append.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Given that the new broker epoch won't be
> > > visible
> > > > > until
> > > > > > > the
> > > > > > > > > > > commit
> > > > > > > > > > > > > has
> > > > > > > > > > > > > > > happened, I have changed this to "the next
> > > available
> > > > > > > offset in
> > > > > > > > > the
> > > > > > > > > > > log"
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 104. REGISTERING(1) : It says
> "Otherwise, the
> > > > > broker
> > > > > > > > > moves
> > > > > > > > > > > into
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > FENCED
> > > > > > > > > > > > > > > > > > state.". It seems this should be RUNNING?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 105. RUNNING: Should we require the
> broker to
> > > > > catch
> > > > > > > up
> > > > > > > > > to the
> > > > > > > > > > > > > > > metadata log
> > > > > > > > > > > > > > > > > > to get into this state?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > For 104 and 105, these sections have been
> > > reworked.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > best,
> > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Fri, Oct 23, 2020 at 1:20 PM Colin
> McCabe
> > > <
> > > > > > > > > > > cmcc...@apache.org
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Wed, Oct 21, 2020, at 05:51, Tom
> Bentley
> > > > > wrote:
> > > > > > > > > > > > > > > > > > > > Hi Colin,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Mon, Oct 19, 2020, at 08:59, Ron
> > > Dagostino
> > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > Hi Colin.  Thanks for the hard
> work
> > > on
> > > > > this
> > > > > > > KIP.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > I have some questions about what
> > > happens
> > > > > to a
> > > > > > > > > broker
> > > > > > > > > > > > > when it
> > > > > > > > > > > > > > > becomes
> > > > > > > > > > > > > > > > > > > > > > fenced (e.g. because it can't
> send a
> > > > > > > heartbeat
> > > > > > > > > > > request to
> > > > > > > > > > > > > > > keep its
> > > > > > > > > > > > > > > > > > > > > > lease).  The KIP says "When a
> broker
> > > is
> > > > > > > fenced,
> > > > > > > > > it
> > > > > > > > > > > cannot
> > > > > > > > > > > > > > > process any
> > > > > > > > > > > > > > > > > > > > > > client requests.  This prevents
> > > brokers
> > > > > which
> > > > > > > > > are not
> > > > > > > > > > > > > > > receiving
> > > > > > > > > > > > > > > > > > > > > > metadata updates or that are not
> > > > > receiving
> > > > > > > and
> > > > > > > > > > > processing
> > > > > > > > > > > > > > > them fast
> > > > > > > > > > > > > > > > > > > > > > enough from causing issues to
> > > clients."
> > > > > And
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > > > > > description of the
> > > > > > > > > > > > > > > > > > > > > > FENCED(4) state it likewise says
> > > "While
> > > > > in
> > > > > > > this
> > > > > > > > > > > state,
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > does
> > > > > > > > > > > > > > > > > > > > > > not respond to client
> requests."  It
> > > > > makes
> > > > > > > sense
> > > > > > > > > > > that a
> > > > > > > > > > > > > > > fenced broker
> > > > > > > > > > > > > > > > > > > > > > should not accept producer
> requests
> > > -- I
> > > > > > > assume
> > > > > > > > > any
> > > > > > > > > > > such
> > > > > > > > > > > > > > > requests
> > > > > > > > > > > > > > > > > > > > > > would result in
> > > > > NotLeaderOrFollowerException.
> > > > > > > > > But
> > > > > > > > > > > what
> > > > > > > > > > > > > > > about KIP-392
> > > > > > > > > > > > > > > > > > > > > > (fetch from follower) consumer
> > > > > requests?  It
> > > > > > > is
> > > > > > > > > > > > > conceivable
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > these
> > > > > > > > > > > > > > > > > > > > > > could continue.  Related to that,
> > > would a
> > > > > > > fenced
> > > > > > > > > > > broker
> > > > > > > > > > > > > > > continue to
> > > > > > > > > > > > > > > > > > > > > > fetch data for partitions where
> it
> > > > > thinks it
> > > > > > > is a
> > > > > > > > > > > > > follower?
> > > > > > > > > > > > > > > Even if
> > > > > > > > > > > > > > > > > > > > > > it rejects consumer requests it
> might
> > > > > still
> > > > > > > > > continue
> > > > > > > > > > > to
> > > > > > > > > > > > > > > fetch as a
> > > > > > > > > > > > > > > > > > > > > > follower.  Might it be helpful to
> > > clarify
> > > > > > > both
> > > > > > > > > > > decisions
> > > > > > > > > > > > > > > here?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Hi Ron,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Good question.  I think a fenced
> broker
> > > > > should
> > > > > > > > > > > continue to
> > > > > > > > > > > > > > > fetch on
> > > > > > > > > > > > > > > > > > > > > partitions it was already fetching
> > > before
> > > > > it
> > > > > > > was
> > > > > > > > > > > fenced,
> > > > > > > > > > > > > > > unless it
> > > > > > > > > > > > > > > > > > > hits a
> > > > > > > > > > > > > > > > > > > > > problem.  At that point it won't be
> > > able to
> > > > > > > > > continue,
> > > > > > > > > > > > > since it
> > > > > > > > > > > > > > > doesn't
> > > > > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > > > the new metadata.  For example, it
> > > won't
> > > > > know
> > > > > > > about
> > > > > > > > > > > > > leadership
> > > > > > > > > > > > > > > changes
> > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > the partitions it's fetching.  The
> > > > > rationale
> > > > > > > for
> > > > > > > > > > > > > continuing to
> > > > > > > > > > > > > > > fetch
> > > > > > > > > > > > > > > > > > > is to
> > > > > > > > > > > > > > > > > > > > > try to avoid disruptions as much as
> > > > > possible.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I don't think fenced brokers should
> > > accept
> > > > > > > client
> > > > > > > > > > > requests.
> > > > > > > > > > > > > > > The issue
> > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > that the fenced broker may or may
> not
> > > have
> > > > > any
> > > > > > > > > data it
> > > > > > > > > > > is
> > > > > > > > > > > > > > > supposed to
> > > > > > > > > > > > > > > > > > > > > have.  It may or may not have
> applied
> > > any
> > > > > > > > > configuration
> > > > > > > > > > > > > > > changes, etc.
> > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > it is supposed to have applied.
> So it
> > > > > could
> > > > > > > get
> > > > > > > > > pretty
> > > > > > > > > > > > > > > confusing, and
> > > > > > > > > > > > > > > > > > > also
> > > > > > > > > > > > > > > > > > > > > potentially waste the client's
> time.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > When fenced, how would the broker
> reply
> > > to a
> > > > > > > client
> > > > > > > > > > > which did
> > > > > > > > > > > > > > > make a
> > > > > > > > > > > > > > > > > > > > request?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hi Tom,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > The broker will respond with a
> retryable
> > > error
> > > > > in
> > > > > > > that
> > > > > > > > > > > case.
> > > > > > > > > > > > > Once
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > client has re-fetched its metadata, it
> > > will no
> > > > > > > longer
> > > > > > > > > see
> > > > > > > > > > > the
> > > > > > > > > > > > > > > fenced broker
> > > > > > > > > > > > > > > > > > > as part of the cluster.  I added a
> note to
> > > the
> > > > > KIP.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > best,
> > > > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Tom
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

Reply via email to