Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

Ron Dagostino Sun, 25 Oct 2020 03:05:03 -0700

Hi Colin and Jun.

Regarding these issues:

83.1 It seems that the broker can transition from FENCED to RUNNING
without registering for a new broker epoch. I am not sure how this
works. Once the controller fences a broker, there is no need for the
controller to keep the boker epoch around. So, if the fenced broker's
heartbeat request with the existing broker epoch will be rejected,
leading the broker back to the FENCED state again.; 104.
REGISTERING(1) : It says "Otherwise, the broker moves into the FENCED
state.". It seems this should be RUNNING?

When would/could a broker re-register -- i.e. send
BrokerRegistrationRequest again once it receives a
BrokerRegistrationResponse containing no error and its broker epoch?
The text states that "Once the period has elapsed, if the broker has
not renewed its registration via a heartbeat, it must re-register."
But the broker state machine only mentions any type of
registration-related event in the REGISTERING state ("While in this
state, the broker tries to register with the active controller");
there is no other broker state in the text that mentions the
possibility of re-registering, and the broker state machine has no
transition back to the REGISTERING state.

Also, the text now states that there are "three broker registration
states: unregistered, registered but fenced, and registered and
active." It would be good to map these onto the formal broker state
machine so we know which "registration states" a broker can be in for
each state within its broker state machine.  It is not clear if there
is a way for a broker to go backwards into the "unregistered" broker
registration state.  I suspect it can only flip-flop between
registered but fenced/registered and active as the broker flip-flops
between ACTIVE and FENCED, and this would imply that a broker is never
strictly required to re-register -- though the option isn't precluded.

Does a broker JVM keep it's assigned broker epoch throughout the life
of the JVM?  The BrokerRegistrationRequest includes a place for the
broker to specify its current broker epoch, but that would only be
useful if the broker is re-registering.  If a broker were to
re-register, the data in the request might seem to imply that it could
do so to specify dynamic changes to its features or endpoints, but
those dynamic changes happen centrally, so that doesn't seem to be a
valid reason to re-register.  So I do not yet see a reason for
re-registering despite the text "if the broker has not renewed its
registration via a heartbeat, it must re-register."

It feels to me that a broker would keep its epoch throughout the life
of its JVM and it would never re-register, and the controller would
remember/maintain the broker epoch when it fences a broker; the broker
would continue to try sending heartbeat requests while it is fenced,
and it would continue to do so until the process is killed via an
external signal.  If the controller eventually does respond with the
broker's next state then that next state will either be ACTIVE
(meaning communication has been restored; the return broker epoch will
be the same one that the broker JVM has had throughout its lifetime
and that it provided in the heartbeat request); or the next state will
be PENDING_CONTROLLED_SHUTDOWN if some other JVM process has since
started with the same broker ID.

I hope that helps the discussion.  Thanks for the great questions,
Jun, and your hard work and responses, Colin.

Ron

On Sat, Oct 24, 2020 at 4:08 AM Tom Bentley <tbent...@redhat.com> wrote:
>
> Hi Colin,
>
> Which error code in particular though? Because so far as I'm aware there's
> no existing error code which really captures this situation and creating a
> new one would not be backward compatible.
>
> Cheers,
>
> Tom
>
> On Sat, Oct 24, 2020 at 12:20 AM Jun Rao <j...@confluent.io> wrote:
>
> > Hi, Colin,
> >
> > Thanks for the reply. A few more comments.
> >
> > 55. There is still text that favors new broker registration. "When a broker
> > first starts up, when it is in the INITIAL state, it will always "win"
> > broker ID conflicts.  However, once it is granted a lease, it transitions
> > out of the INITIAL state.  Thereafter, it may lose subsequent conflicts if
> > its broker epoch is stale.  (See KIP-380 for some background on broker
> > epoch.)  The reason for favoring new processes is to accommodate the common
> > case where a process is killed with kill -9 and then restarted.  We want it
> > to be able to reclaim its old ID quickly in this case."
> >
> > 80.1 Sounds good. Could you document that listeners is a required config
> > now? It would also be useful to annotate other required configs. For
> > example, controller.connect should be required.
> >
> > 80.2 Could you list all deprecated existing configs? Another one is
> > control.plane.listener.name since the controller no longer sends
> > LeaderAndIsr, UpdateMetadata and StopReplica requests.
> >
> > 83.1 It seems that the broker can transition from FENCED to RUNNING without
> > registering for a new broker epoch. I am not sure how this works. Once the
> > controller fences a broker, there is no need for the controller to keep the
> > boker epoch around. So, if the fenced broker's heartbeat request with the
> > existing broker epoch will be rejected, leading the broker back to the
> > FENCED state again.
> >
> > 83.5 Good point on KIP-590. Then should we expose the controller for
> > debugging purposes? If not, we should deprecate the controllerID field in
> > MetadataResponse?
> >
> > 90. We rejected the shared ID with just one reason "This is not a good idea
> > because NetworkClient assumes a single ID space.  So if there is both a
> > controller 1 and a broker 1, we don't have a way of picking the "right"
> > one." This doesn't seem to be a strong reason. For example, we could
> > address the NetworkClient issue with the node type as you pointed out or
> > using the negative value of a broker ID as the controller ID.
> >
> > 100. In KIP-589
> > <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller
> > >,
> > the broker reports all offline replicas due to a disk failure to the
> > controller. It seems this information needs to be persisted to the metadata
> > log. Do we have a corresponding record for that?
> >
> > 101. Currently, StopReplica request has 2 modes, without deletion and with
> > deletion. The former is used for controlled shutdown and handling disk
> > failure, and causes the follower to stop. The latter is for topic deletion
> > and partition reassignment, and causes the replica to be deleted. Since we
> > are deprecating StopReplica, could we document what triggers the stopping
> > of a follower and the deleting of a replica now?
> >
> > 102. Should we include the metadata topic in the MetadataResponse? If so,
> > when it will be included and what will the metadata response look like?
> >
> > 103. "The active controller assigns the broker a new broker epoch, based on
> > the latest committed offset in the log." This seems inaccurate since the
> > latest committed offset doesn't always advance on every log append.
> >
> > 104. REGISTERING(1) : It says "Otherwise, the broker moves into the FENCED
> > state.". It seems this should be RUNNING?
> >
> > 105. RUNNING: Should we require the broker to catch up to the metadata log
> > to get into this state?
> >
> > Thanks,
> >
> > Jun
> >
> >
> >
> > On Fri, Oct 23, 2020 at 1:20 PM Colin McCabe <cmcc...@apache.org> wrote:
> >
> > > On Wed, Oct 21, 2020, at 05:51, Tom Bentley wrote:
> > > > Hi Colin,
> > > >
> > > > On Mon, Oct 19, 2020, at 08:59, Ron Dagostino wrote:
> > > > > > Hi Colin.  Thanks for the hard work on this KIP.
> > > > > >
> > > > > > I have some questions about what happens to a broker when it
> > becomes
> > > > > > fenced (e.g. because it can't send a heartbeat request to keep its
> > > > > > lease).  The KIP says "When a broker is fenced, it cannot process
> > any
> > > > > > client requests.  This prevents brokers which are not receiving
> > > > > > metadata updates or that are not receiving and processing them fast
> > > > > > enough from causing issues to clients." And in the description of
> > the
> > > > > > FENCED(4) state it likewise says "While in this state, the broker
> > > does
> > > > > > not respond to client requests."  It makes sense that a fenced
> > broker
> > > > > > should not accept producer requests -- I assume any such requests
> > > > > > would result in NotLeaderOrFollowerException.  But what about
> > KIP-392
> > > > > > (fetch from follower) consumer requests?  It is conceivable that
> > > these
> > > > > > could continue.  Related to that, would a fenced broker continue to
> > > > > > fetch data for partitions where it thinks it is a follower?  Even
> > if
> > > > > > it rejects consumer requests it might still continue to fetch as a
> > > > > > follower.  Might it be helpful to clarify both decisions here?
> > > > >
> > > > > Hi Ron,
> > > > >
> > > > > Good question.  I think a fenced broker should continue to fetch on
> > > > > partitions it was already fetching before it was fenced, unless it
> > > hits a
> > > > > problem.  At that point it won't be able to continue, since it
> > doesn't
> > > have
> > > > > the new metadata.  For example, it won't know about leadership
> > changes
> > > in
> > > > > the partitions it's fetching.  The rationale for continuing to fetch
> > > is to
> > > > > try to avoid disruptions as much as possible.
> > > > >
> > > > > I don't think fenced brokers should accept client requests.  The
> > issue
> > > is
> > > > > that the fenced broker may or may not have any data it is supposed to
> > > > > have.  It may or may not have applied any configuration changes, etc.
> > > that
> > > > > it is supposed to have applied.  So it could get pretty confusing,
> > and
> > > also
> > > > > potentially waste the client's time.
> > > > >
> > > > >
> > > > When fenced, how would the broker reply to a client which did make a
> > > > request?
> > > >
> > >
> > > Hi Tom,
> > >
> > > The broker will respond with a retryable error in that case.  Once the
> > > client has re-fetched its metadata, it will no longer see the fenced
> > broker
> > > as part of the cluster.  I added a note to the KIP.
> > >
> > > best,
> > > Colin
> > >
> > > >
> > > > Thanks,
> > > >
> > > > Tom
> > > >
> > >
> >

Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

Reply via email to