Thanks Jack, this is quite helpful. I will check them out.

On Tue, Feb 23, 2021 at 12:52 AM Jack Vanlightly
<jvanligh...@splunk.com.invalid> wrote:

> In terms of tools, thinking processes, setup... you really need to start by
> learning about TLA+. I couldn't explain how it changes your engineering
> mindset/workflow better than Leslie Lamport himself. So if you're
> interested in this kind of thing then I highly recommend watching the TLA+
> video course by Leslie Lamport:
>
> https://urldefense.com/v3/__https://www.youtube.com/channel/UCajiu4Cj_GHOX0if3Up-eRA__;!!DCbAVzZNrAf4!T7-W6Mvn3QiJ-yydRe46KPWA8QdglLKQzuRtxcm1p0hOne9XwSQk-OFRo0izO-It_w$
> . I personally had
> to watch it more than once to understand some points.
>
> Leslie Lamport also wrote a book which has been helpful to me:
>
> https://urldefense.com/v3/__https://lamport.azurewebsites.net/tla/book.html__;!!DCbAVzZNrAf4!T7-W6Mvn3QiJ-yydRe46KPWA8QdglLKQzuRtxcm1p0hOne9XwSQk-OFRo0gS54kMaw$
> . The book will require a
> fair amount of study to understand it all. For me, the liveness stuff came
> later, after I'd already written a couple of safety oriented specs.
>
> Next up there are some examples you can learn from here:
>
> https://urldefense.com/v3/__https://github.com/tlaplus/CommunityModules__;!!DCbAVzZNrAf4!T7-W6Mvn3QiJ-yydRe46KPWA8QdglLKQzuRtxcm1p0hOne9XwSQk-OFRo0ij0OnNJA$
>
> When you start writing your first spec you will hit brick walls along the
> way. I highly recommend post your questions here:
>
> https://urldefense.com/v3/__https://groups.google.com/g/tlaplus__;!!DCbAVzZNrAf4!T7-W6Mvn3QiJ-yydRe46KPWA8QdglLKQzuRtxcm1p0hOne9XwSQk-OFRo0i76sUkQw$
> . The community is very responsive and
> helpful for all levels.
>
> Regular engineers can pick up enough TLA+ in 2-3 weeks to write a useful
> spec, you don't need a PhD in mathematics. Once you can write something
> simple, then find something real to model. If you don't have a specific
> thing you want to write a spec for then one thing you can do is take a
> distributed system that you know and look for a protocol bug that has since
> been fixed. Try to implement the most minimum spec that recreates the bug.
> An example of this I did was the Apache Kafka KIP 101 which was a data loss
> protocol bug.
>
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-101*-*Alter*Replication*Protocol*to*use*Leader*Epoch*rather*than*High*Watermark*for*Truncation__;KysrKysrKysrKysrKys!!DCbAVzZNrAf4!T7-W6Mvn3QiJ-yydRe46KPWA8QdglLKQzuRtxcm1p0hOne9XwSQk-OFRo0j6dCm0jg$
>
> Once you can put together a basic spec, find something that you want to
> verify, whether it exists already or it is a design proposal. I recently
> modelled Pravegas auto-scaling topics which is not too complex and makes
> for a good first spec (as well as being related to BookKeeper).
>
>
>
>
> On Mon, Feb 22, 2021 at 7:15 PM Karan Mehta <k.me...@salesforce.com
> .invalid>
> wrote:
>
> > [ External sender. Exercise caution. ]
> >
> > Awesome work Jack!
> >
> > I haven't worked with formal verification but have always been
> interested.
> > Can you share the tools, thinking processes, setup etc that went into
> this?
> >
> >
> > On Mon, Feb 22, 2021 at 9:19 AM Jack Vanlightly
> > <jvanligh...@splunk.com.invalid> wrote:
> >
> > > Hi all,
> > >
> > > I have written a TLA+ specification for the replication protocol. It
> > models
> > > the life of a single ledger and includes two writers, the original
> writer
> > > that creates the ledger and a second writer that can at any point (or
> > not)
> > > start the recovery process.
> > >
> > > It found one problem with the fencing that allowed for a data loss
> > scenario
> > > to occur. You can find the details of that particular problem in issue
> > 2614
> > > (
> > >
> >
> https://urldefense.com/v3/__https://github.com/apache/bookkeeper/issues/2614__;!!DCbAVzZNrAf4!VUV5pNTOHyZQOg1ZhwuWT7JIcxobv26MM14ErklQukaMnxsSftnbe-mMrmNe4qfOnA$
> > > ).
> > >
> > > The specification is here:
> > >
> >
> https://urldefense.com/v3/__https://github.com/Vanlightly/bookkeeper-tlaplus__;!!DCbAVzZNrAf4!VUV5pNTOHyZQOg1ZhwuWT7JIcxobv26MM14ErklQukaMnxsSftnbe-mMrmOuWn47BA$
> > >
> > > The state space for a 4 broker deployment with a Qw 3, Qa 2 ledger is
> > > pretty large and takes days to run on a 96 core beast. However, you can
> > run
> > > it with smaller ensemble sizes to run in mere hours.
> > >
> > > Check it out if you're interested in formal verification. Also if there
> > are
> > > other aspects of BookKeeper that could benefit from verification let me
> > > know.
> > >
> > > Best regards
> > > Jack
> > >
> > >
> > > --
> > > *Jack Vanlightly*
> > > Principal Software Engineer
> > > Splunk Inc.
> > > jvanligh...@splunk.com <kramas...@splunk.com>
> > > Barcelona
> > >
> >
> >
> > --
> > Karan Mehta
> >
> > <https://smart.salesforce.com/sig/k.mehta//us_mb/default/link.html>
> >
>


-- 
Karan Mehta

<https://smart.salesforce.com/sig/k.mehta//us_mb/default/link.html>

Reply via email to