> On Feb. 8, 2018, 8:22 p.m., Vinod Kone wrote: > > src/master/master.cpp > > Lines 2186 (patched) > > <https://reviews.apache.org/r/65571/diff/1/?file=1954571#file1954571line2186> > > > > s/Leader detector indicated no master elected/No master was elected/ > > > > More importantly, this changes the semantics a bit. Previously if this > > master was the current leader it committed suicide even in this case. But > > we don't anymore. Is that what we want? > > > > > > Also, where in the interface does it say that None() is retryable. It > > says retryable errors are handled internally by the detector? > > Benno Evers wrote: > Ok, I guess I misinterpreted the documentation on > `MasterDetector::detect()`: > > ``` > * Returns MasterInfo after an election has occurred and the elected > * master is different than that specified (if any), or NONE if an > * election occurs and no master is elected (e.g., all masters are > * lost). A failed future is returned if the detector is unable to > * detect the leading master due to a non-retryable error. > ``` > > Since electing no master sounds like an error, and the future is not > failed, I assumed that this case was implicitly classifid as retryable error. > > Anyways, for the semantics I assume that not aborting is at least what > the original author intended, otherwise there would be no point to > differentiate between `Error` and `None` in the API. > > Additionally, the code that causes the master to crash in this situation > was introduced relatively recently (11 July 2017, a8c7ae44c8), before that > the `detected()`-handler would have just set `leader` to `None` and quietly > continued. So I would argue that this fix is actually restoring the > previously existing behaviour, not changing it. > > Vinod Kone wrote: > "...before that the detected()-handler would have just set leader to None > and quietly continued". Is this true? AFAICT the commit you mentioned only > adds leader domain related changes, doesn't change the behavior we are > talking about? See: https://reviews.apache.org/r/59763/
Oh, sorry, I thought you were talking about the case where a non-leading master is passed `None`, which also would have crashed before this fix but doesn't crash anymore. I've updated the code path to restore the suicide in the case where `None` is passed to a leading master. - Benno ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65571/#review197103 ----------------------------------------------------------- On Feb. 13, 2018, 8:05 p.m., Benno Evers wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/65571/ > ----------------------------------------------------------- > > (Updated Feb. 13, 2018, 8:05 p.m.) > > > Review request for mesos, Andrei Budnik and Vinod Kone. > > > Bugs: MESOS-8550 > https://issues.apache.org/jira/browse/MESOS-8550 > > > Repository: mesos > > > Description > ------- > > The function `MasterDetector::detect()` returns a value of type > `Future<Option<MasterInfo>>`, which, according to its documentation, > can be `None` if an election occured and no master is elected. > > However, the code in `Master::detected()` was only handling the > cases of a failed future or a valid `MasterInfo` object. > > > Diffs > ----- > > src/master/master.cpp d7d22866f7a4eb87bd8949efafc97e828e7d4b94 > src/tests/cluster.hpp ad2b80e658d2f8afcefe9969d62cd33f0c475ce9 > src/tests/master_tests.cpp 28663c7a77096943949350abb3d13f9c04505f5b > > > Diff: https://reviews.apache.org/r/65571/diff/3/ > > > Testing > ------- > > `./mesos-tests` > > > Thanks, > > Benno Evers > >
