Hi Jack, thanks for providing this material. I'll spend some time
digging into it.

On Mon, Sep 11, 2023 at 6:56 PM Jack Vanlightly <vanligh...@apache.org> wrote:
>
> Hi all,
>
> I agree that we should have the protocol description and specification in
> the Kafka repository. I have two other specifications to contribute
> including the KRaft implementation of Raft and the next gen consumer group
> protocol (WIP). I also have a formal prose description of the Kafka
> replication protocol for how it works today. That should appear under the
> 3.5 directory sometime this week.
>
> Regarding moving this kind of stuff into the Kafka repo, there are a number
> of additional topics such as:
>
>    - Where to place formal prose descriptions and specifications of
>    protocols in the Kafka repo.

I'm thinking about a dedicated folder (i.e. tlaplus) with a main
README.md where we briefly explain what's contained.

>    - Given we may add more specifications and potentially other formal
>    prose descriptions, how to structure things?

Each spec would live in its own sub folder under specs, along with a
README.md containing the prose description. I think we should also
provide the configuration files, and a bash script to easily run the
model checker on them. Wdyt?

>    - Do we use markdown like I have done here with svg files (created in
>    Excalidraw with the Excalidraw bits embedded in the svg)., or do we use
>    LaTex? Some people may wish to read this directly on GitHub and perhaps
>    others like a PDF. Markdown is more accessible and perhaps more likely to
>    fulfil the living document approach.

I agree that Markdown is more convenient and will increase the chance
that these specifications will receive updates.

>
> Thanks
> Jack
>
>
> On Mon, Sep 11, 2023 at 1:45 PM Divij Vaidya <divijvaidy...@gmail.com>
> wrote:
>
> > This is very useful Jack!
> >
> > 1. We are missing a replication protocol specification in our project.
> > Ideally it should be a living document and adding what you wrote to
> > the docs/design would be a great start towards that goal.
> > 2. I have also been building on top of some existing TLA+ to add
> > changes to replication protocol brought by features such as Tiered
> > Storage, local retention etc. at
> >
> > https://github.com/divijvaidya/kafka-specification/blob/master/KafkaReplication.tla
> > 3. Apart from verifying correctness, I believe a TLA+ model will also
> > help developers quickly iterate through their fundamental assumptions
> > while making changes. As an example, we recently had a discussion in
> > the community on whether to increase leader epoch with shrink/expand
> > ISR or not. There was another discussion on whether we can choose the
> > replica with the largest end offset as the new leader to reduce
> > truncation. In both these cases, we could have quickly modified the
> > existing TLA+ model, run through it and verify that the assumptions
> > still hold true. It would be great if we can take such discussions as
> > an example and demonstrate how TLA+ could have benefitted the
> > community. It would help make the case for adding the TLA+ spec as
> > part of the community owned project itself.
> >
> > Right now, things are a bit busy on my end, but I am looking forward
> > to exploring what you shared above in the coming weeks (perhaps a
> > first review by end of sept).
> >
> > Thank you again for starting this conversation.
> >
> > --
> > Divij Vaidya
> >
> > On Mon, Sep 11, 2023 at 4:49 AM Haruki Okada <ocadar...@gmail.com> wrote:
> > >
> > > Hi Jack,
> > >
> > > Thank you for the great work, not only the spec but also for the
> > > comprehensive documentation about the replication.
> > > Actually I wrote some TLA+ spec to verify unclean leader election
> > behavior
> > > before so I will double-check my understanding with your complete spec :)
> > >
> > >
> > > Thanks,
> > >
> > > 2023年9月10日(日) 21:42 David Jacot <da...@apache.org>:
> > >
> > > > Hi Jack,
> > > >
> > > > This is great! Thanks for doing it. I will look into it when I have a
> > bit
> > > > of time, likely after Current.
> > > >
> > > > Would you be interested in contributing it to the main repository? That
> > > > would be a great contribution to the project. Having it there would
> > allow
> > > > the community to maintain it while changes to the protocol are made.
> > That
> > > > would also pave the way for having other specs in the future (e.g. new
> > > > consumer group protocol).
> > > >
> > > > Best,
> > > > David
> > > >
> > > > Le dim. 10 sept. 2023 à 12:45, Jack Vanlightly <vanligh...@apache.org>
> > a
> > > > écrit :
> > > >
> > > > > Hi all,
> > > > >
> > > > > As part of my work on formally verifying different parts of Apache
> > Kafka
> > > > > and working on KIP-966 I have built up a lot of knowledge about how
> > the
> > > > > replication protocol works. Currently it is mostly documented across
> > > > > various KIPs and in the code itself. I have written a complete
> > protocol
> > > > > description (with KIP-966 changes applied) which is inspired by the
> > > > precise
> > > > > but accessible style and language of the Raft paper. The idea is
> > that it
> > > > > makes it easier for contributors and anyone else interested in the
> > > > protocol
> > > > > to learn how it works, the fundamental properties it has and how
> > those
> > > > > properties are supported by the various behaviors and conditions.
> > > > >
> > > > > It currently resides next to the TLA+ specification itself in my
> > > > > kafka-tlaplus repository. I'd be interested to receive feedback from
> > the
> > > > > community.
> > > > >
> > > > >
> > > > >
> > > >
> > https://github.com/Vanlightly/kafka-tlaplus/blob/main/kafka_data_replication/kraft/kip-966/description/0_kafka_replication_protocol.md
> > > > >
> > > > > Thanks
> > > > > Jack
> > > > >
> > > >
> > >
> > >
> > > --
> > > ========================
> > > Okada Haruki
> > > ocadar...@gmail.com
> > > ========================
> >

Reply via email to