Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

Colin McCabe Wed, 08 Jul 2020 14:12:13 -0700

On Tue, Jul 7, 2020, at 18:27, Ron Dagostino wrote:
> HI Colin.  Thanks for the KIP.  Here is some feedback and various questions.
>


Hi Ron,

Thanks for taking a look!

> "*Controller processes will listen on a separate port from brokers.  This
> will be true even when the broker and controller are co-located in the same
> JVM*". I assume it is possible that the port numbers could be the same when
> using separate JVMs (i.e. broker uses port 9192 and controller also uses
> port 9192).  I think it would be clearer to state this along these
> lines: "Controller
> nodes will listen on a port, and the controller port must differ from any
> port that a broker in the same JVM is listening on.  In other words, a
> controller and a broker node, when in the same JVM, do not share ports"
> 

I changed it to "Controller processes will listen on a separate endpoint from 
brokers" to avoid confusion.

>
> I think the sentence "*In the realm of ACLs, this translates to controllers
> requiring CLUSTERACTION on CLUSTER for all operations*" is confusing.  It
> feels to me that you can just delete it.  Am I missing something here?
> 

Hmm.  I think it is important to be specific about what ACLs we plan to 
require.  CLUSTERACTION on CLUSTER is what we generally use for 
broker-to-broker operations.  I'm not sure how to improve this.  Maybe a link 
to some background material would help?

>
> The KIP states "*The metadata will be stored in memory on all the active
> controllers.*"  Can there be multiple active controllers?  Should it
> instead read "The metadata will be stored in memory on all potential
> controllers." (or something like that)?
> 

Oh, this should have just been "The metadata will be stored in memory on all 
the controllers."  Fixed.

>
> KIP-595 states "*we have assumed the name __cluster_metadata for this
> topic, but this is not a formal part of this proposal*".  This KIP-631
> states "*Metadata changes need to be persisted to the __metadata log before
> we propagate them to the other nodes in the cluster.  This means waiting
> for the metadata log's last stable offset to advance to the offset of the
> change.*"  Are we here formally defining "__metadata" as the topic name,
> and should these sentences refer to "__metadata topic" rather than
> "__metadata log"?

I think the place to have the discussion about the name of the metadata topic 
is here.  The other KIPs only refer to this tangentially.

We've had a few suggestions in the past, but the current one in the doc is 
__kafka_metadata.  Thinking about it a bit more, though, maybe we should use 
something that isn't valid as a normal topic name, to avoid conflicts.  Perhaps 
~metadata would be good.

>  What are the "other nodes in the cluster" that are
> referred to?  These are not controller nodes but brokers, right?

It's intended to be all nodes: controller and brokers.  After all, the other 
controllers also need to mirror the metadata.

>  If so,
> then should we say "before we propagate them to the brokers"?  Technically
> we have a controller cluster and a broker cluster -- two separate clusters,
> correct?  (Even though we could potentially share JVMs and therefore
> require no additional processes.). If the statement is referring to nodes
> in both clusters then maybe we should state "before we propagate them to
> the other nodes in the controller cluster or to brokers."
> 
> "*The controller may have several of these uncommitted changes in flight at
> any given time.  In essence, the controller's in-memory state is always a
> little bit in the future compared to the current state.  This allows the
> controller to continue doing things while it waits for the previous changes
> to be committed to the Raft log.*"  Should the three references above be to
> the active controller rather than just the controller?
> 

Good point.  I fixed this to refer to the "active controller."

>
> "*Therefore, the controller must not make this future state "visible" to
> the rest of the cluster until it has been made persistent – that is, until
> it becomes current state*". Again I wonder if this should refer to "active"
> controller, and indicate "anyone else" as opposed to "the rest of the
> cluster" since we are talking about 2 clusters here?
> 

Fixed.

>
> "*When the active controller decides that it itself should create a
> snapshot, it will first try to give up the leadership of the Raft quorum.*"
> Why?  Is it necessary to state this?  It seems like it might be an
> implementation detail rather than a necessary constraint/requirement that
> we declare publicly and would have to abide by.
>

It seems in scope here, since it describes how the controller interacts with 
the snapshotting mechanism.  An implementation detail would be something more 
like a class name or code, I think.

>
> "*It will reject brokers whose metadata is too stale*". Why?  An example
> might be helpful here.
> 

There's a bit more context in KIP-500.  Stale metadata is a general problem 
since it can confuse clients-- they don't see a topic which actually does 
exist, for example, or do see one that was deleted.

>
> "*it may lose subsequent conflicts if its broker epoch is stale*" This is
> the first time a "broker epoch" is mentioned.  I am assuming it is the
> controller epoch communicated to it (if any).  It would be good to
> introduce it/explicitly state what it is before referring to it.
> 

Broker epoch was first described in KIP-380.  I added a link to that KIP for 
some background.
( https://cwiki.apache.org/confluence/x/-oGzBQ )

best,
Colin

>
> Ron
> 
> On Tue, Jul 7, 2020 at 6:48 PM Colin McCabe <cmcc...@apache.org> wrote:
> 
> > Hi all,
> >
> > I posted a KIP about how the quorum-based controller envisioned in KIP-500
> > will work.  Please take a look here:
> > https://cwiki.apache.org/confluence/x/4RV4CQ
> >
> > best,
> > Colin
> >
>

Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

Reply via email to