Hi Jason,

It's amazing to see this coming together :)

I haven't had a chance to read in detail, but I read the outline and a few 
things jumped out at me.

First, for every epoch that is 32 bits rather than 64, I sort of wonder if 
that's a good long-term choice.  I keep reading about stuff like this: 
https://issues.apache.org/jira/browse/ZOOKEEPER-1277 .  Obviously, that JIRA is 
about zxid, which increments much faster than we expect these leader epochs to, 
but it would still be good to see some rough calculations about how long 32 
bits (or really, 31 bits) will last us in the cases where we're using it, and 
what the space savings we're getting really is.  It seems like in most cases 
the tradeoff may not be worth it?

Another thing I've been thinking about is how we do bootstrapping.  I would 
prefer to be in a world where formatting a new Kafka node was a first class 
operation explicitly initiated by the admin, rather than something that 
happened implicitly when you started up the broker and things "looked blank."

The first problem is that things can "look blank" accidentally if the storage 
system is having a bad day.  Clearly in the non-Raft world, this leads to data 
loss if the broker that is (re)started this way was the leader for some 
partitions.

The second problem is that we have a bit of a chicken and egg problem with 
certain configuration keys.  For example, maybe you want to configure some 
connection security settings in your cluster, but you don't want them to ever 
be stored in a plaintext config file.  (For example, SCRAM passwords, etc.)  
You could use a broker API to set the configuration, but that brings up the 
chicken and egg problem.  The broker needs to be configured to know how to talk 
to you, but you need to configure it before you can talk to it.  Using an 
external secret manager like Vault is one way to solve this, but not everyone 
uses an external secret manager.

quorum.voters seems like a similar configuration key.  In the current KIP, this 
is only read if there is no other configuration specifying the quorum voter 
set.  If we had a kafka.mkfs command, we wouldn't need this key because we 
could assume that there was always quorum information stored locally.

best,
Colin


On Thu, Apr 16, 2020, at 16:44, Jason Gustafson wrote:
> Hi All,
> 
> I'd like to start a discussion on KIP-595:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-595%3A+A+Raft+Protocol+for+the+Metadata+Quorum.
> This proposal specifies a Raft protocol to ultimately replace Zookeeper 
> as
> documented in KIP-500. Please take a look and share your thoughts.
> 
> A few minor notes to set the stage a little bit:
> 
> - This KIP does not specify the structure of the messages used to represent
> metadata in Kafka, nor does it specify the internal API that will be used
> by the controller. Expect these to come in later proposals. Here we are
> primarily concerned with the replication protocol and basic operational
> mechanics.
> - We expect many details to change as we get closer to integration with
> the controller. Any changes we make will be made either as amendments to
> this KIP or, in the case of larger changes, as new proposals.
> - We have a prototype implementation which I will put online within the
> next week which may help in understanding some details. It has diverged a
> little bit from our proposal, so I am taking a little time to bring it in
> line. I'll post an update to this thread when it is available for review.
> 
> Finally, I want to mention that this proposal was drafted by myself, Boyang
> Chen, and Guozhang Wang.
> 
> Thanks,
> Jason
>

Reply via email to