Hi Jack, thanks for providing this material. I'll spend some time digging into it.
On Mon, Sep 11, 2023 at 6:56 PM Jack Vanlightly <vanligh...@apache.org> wrote: > > Hi all, > > I agree that we should have the protocol description and specification in > the Kafka repository. I have two other specifications to contribute > including the KRaft implementation of Raft and the next gen consumer group > protocol (WIP). I also have a formal prose description of the Kafka > replication protocol for how it works today. That should appear under the > 3.5 directory sometime this week. > > Regarding moving this kind of stuff into the Kafka repo, there are a number > of additional topics such as: > > - Where to place formal prose descriptions and specifications of > protocols in the Kafka repo. I'm thinking about a dedicated folder (i.e. tlaplus) with a main README.md where we briefly explain what's contained. > - Given we may add more specifications and potentially other formal > prose descriptions, how to structure things? Each spec would live in its own sub folder under specs, along with a README.md containing the prose description. I think we should also provide the configuration files, and a bash script to easily run the model checker on them. Wdyt? > - Do we use markdown like I have done here with svg files (created in > Excalidraw with the Excalidraw bits embedded in the svg)., or do we use > LaTex? Some people may wish to read this directly on GitHub and perhaps > others like a PDF. Markdown is more accessible and perhaps more likely to > fulfil the living document approach. I agree that Markdown is more convenient and will increase the chance that these specifications will receive updates. > > Thanks > Jack > > > On Mon, Sep 11, 2023 at 1:45 PM Divij Vaidya <divijvaidy...@gmail.com> > wrote: > > > This is very useful Jack! > > > > 1. We are missing a replication protocol specification in our project. > > Ideally it should be a living document and adding what you wrote to > > the docs/design would be a great start towards that goal. > > 2. I have also been building on top of some existing TLA+ to add > > changes to replication protocol brought by features such as Tiered > > Storage, local retention etc. at > > > > https://github.com/divijvaidya/kafka-specification/blob/master/KafkaReplication.tla > > 3. Apart from verifying correctness, I believe a TLA+ model will also > > help developers quickly iterate through their fundamental assumptions > > while making changes. As an example, we recently had a discussion in > > the community on whether to increase leader epoch with shrink/expand > > ISR or not. There was another discussion on whether we can choose the > > replica with the largest end offset as the new leader to reduce > > truncation. In both these cases, we could have quickly modified the > > existing TLA+ model, run through it and verify that the assumptions > > still hold true. It would be great if we can take such discussions as > > an example and demonstrate how TLA+ could have benefitted the > > community. It would help make the case for adding the TLA+ spec as > > part of the community owned project itself. > > > > Right now, things are a bit busy on my end, but I am looking forward > > to exploring what you shared above in the coming weeks (perhaps a > > first review by end of sept). > > > > Thank you again for starting this conversation. > > > > -- > > Divij Vaidya > > > > On Mon, Sep 11, 2023 at 4:49 AM Haruki Okada <ocadar...@gmail.com> wrote: > > > > > > Hi Jack, > > > > > > Thank you for the great work, not only the spec but also for the > > > comprehensive documentation about the replication. > > > Actually I wrote some TLA+ spec to verify unclean leader election > > behavior > > > before so I will double-check my understanding with your complete spec :) > > > > > > > > > Thanks, > > > > > > 2023年9月10日(日) 21:42 David Jacot <da...@apache.org>: > > > > > > > Hi Jack, > > > > > > > > This is great! Thanks for doing it. I will look into it when I have a > > bit > > > > of time, likely after Current. > > > > > > > > Would you be interested in contributing it to the main repository? That > > > > would be a great contribution to the project. Having it there would > > allow > > > > the community to maintain it while changes to the protocol are made. > > That > > > > would also pave the way for having other specs in the future (e.g. new > > > > consumer group protocol). > > > > > > > > Best, > > > > David > > > > > > > > Le dim. 10 sept. 2023 à 12:45, Jack Vanlightly <vanligh...@apache.org> > > a > > > > écrit : > > > > > > > > > Hi all, > > > > > > > > > > As part of my work on formally verifying different parts of Apache > > Kafka > > > > > and working on KIP-966 I have built up a lot of knowledge about how > > the > > > > > replication protocol works. Currently it is mostly documented across > > > > > various KIPs and in the code itself. I have written a complete > > protocol > > > > > description (with KIP-966 changes applied) which is inspired by the > > > > precise > > > > > but accessible style and language of the Raft paper. The idea is > > that it > > > > > makes it easier for contributors and anyone else interested in the > > > > protocol > > > > > to learn how it works, the fundamental properties it has and how > > those > > > > > properties are supported by the various behaviors and conditions. > > > > > > > > > > It currently resides next to the TLA+ specification itself in my > > > > > kafka-tlaplus repository. I'd be interested to receive feedback from > > the > > > > > community. > > > > > > > > > > > > > > > > > > > > > https://github.com/Vanlightly/kafka-tlaplus/blob/main/kafka_data_replication/kraft/kip-966/description/0_kafka_replication_protocol.md > > > > > > > > > > Thanks > > > > > Jack > > > > > > > > > > > > > > > > > > -- > > > ======================== > > > Okada Haruki > > > ocadar...@gmail.com > > > ======================== > >