Hi Jack, > I've recently modelled the BookKeeper protocol in TLA+ and can confirm that > once confirmed, that an entry is not replayed to another bookie.
Should I assume that you modeled it after the code? Otherwise, what did you use as a reference? Is the TLA+ spec available anywhere? It sounds like a good development. > once confirmed, that an entry is not replayed to another bookie. I'd like to understand this a bit better. I think this is saying that if I have an entry e that is written to AQ < WQ, and at least one bookie b in the ledger ensemble crashes before it writes e, then e is considered confirmed and when b is replaced with b' for the ledger, e is not replicated on b'. If that's the case, then isn't it a bug? > the new data integrity check that Ivan worked on, when run periodically > will be able to repair that hole. This is good, but I'm not sure this is a replacement for a proper fix. Please let me know if I'm missing anything. -Flavio > On 11 Jan 2021, at 09:31, Jack Vanlightly <jvanligh...@splunk.com.INVALID> > wrote: > > Hi, > > I've recently modelled the BookKeeper protocol in TLA+ and can confirm that > once confirmed, that an entry is not replayed to another bookie. This > leaves a "hole" as the entry is now replicated only to 2 bookies, however, > the new data integrity check that Ivan worked on, when run periodically > will be able to repair that hole. > > Jack > > On Sat, Jan 9, 2021 at 1:06 AM Venkateswara Rao Jujjuri <jujj...@gmail.com> > wrote: > >> [ External sender. Exercise caution. ] >> >> On Fri, Jan 8, 2021 at 2:29 PM Matteo Merli <matteo.me...@gmail.com> >> wrote: >> >>> On Fri, Jan 8, 2021 at 2:15 PM Venkateswara Rao Jujjuri >>> <jujj...@gmail.com> wrote: >>>> >>>>> otherwise the write will timeout internally and it will get replayed >>> to a >>>> new bookie. >>>> If Qa is met and the writes of Qw-Qa fail after we send the success to >>> the >>>> client, why would the write replayed on a new bookie? >>> >>> I think the original intention was to avoid having 1 bookie with a >>> "hole" in the entries sequence. If you then lose one of the 2 bookies, >>> it would be difficult to know which entries need to be recovered. >>> >> >> @Matteo Merli <matteo.me...@gmail.com> I don't believe we retry the write >> on bookie if Qa is satisfied and the write to a bookie timedout. >> Once the entry is ack'ed to the client we move the LAC and can't >> retroactively change the active segment's ensemble. >> >>> will get replayed to a new bookie >> This will happen only if we are not able to satisfy Qa and go through >> ensemble changes. >> We change the ensemble and tetry write only if bookie write fails before >> satisfying Qa. >> We have added a new feature called handling "delayed write failure", but >> that happens only for >> new entries not retroactively. >> >> I may be missing something here, and not understanding your point. >> >> Thanks, >> JV >> >> >> >> >> -- >> Jvrao >> --- >> First they ignore you, then they laugh at you, then they fight you, then >> you win. - Mahatma Gandhi >>