> On Jan. 12, 2021, 12:17 p.m., Benjamin Mahler wrote: > > src/tests/master_tests.cpp > > Lines 11235-11239 (patched) > > <https://reviews.apache.org/r/73131/diff/1/?file=2244100#file2244100line11235> > > > > Maybe a TODO that we can use the in-memory registry here if we made it > > injectable in StartMaster? > > > > (The benefit being that the tests run faster with the in-memory one).
Added. > On Jan. 12, 2021, 12:17 p.m., Benjamin Mahler wrote: > > src/tests/master_tests.cpp > > Lines 11247 (patched) > > <https://reviews.apache.org/r/73131/diff/1/?file=2244100#file2244100line11247> > > > > We can just avoid this variable and passing it in to StartSlave since > > we're using the default flags? Inlined. > On Jan. 12, 2021, 12:17 p.m., Benjamin Mahler wrote: > > src/tests/master_tests.cpp > > Lines 11268 (patched) > > <https://reviews.apache.org/r/73131/diff/1/?file=2244100#file2244100line11268> > > > > Don't need to settle here since we're just waiting for mark after. Fixed. - Ilya ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/73131/#review222447 ----------------------------------------------------------- On Jan. 11, 2021, 5:23 p.m., Ilya Pronin wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/73131/ > ----------------------------------------------------------- > > (Updated Jan. 11, 2021, 5:23 p.m.) > > > Review request for mesos and Benjamin Mahler. > > > Bugs: MESOS-10209 > https://issues.apache.org/jira/browse/MESOS-10209 > > > Repository: mesos > > > Description > ------- > > During master failover if agent reregistration runs concurrently with > marking the agent as unreachable and finishes before the MarkUnreachable > operation is complete, the assertion that the agent is in the recovered > set in Master::_markUnreachable() doesn't hold. The reason for this is > because after readmitting the agent the master removes it from the > recovered set in Master::__reregisterSlave(). > > We can fix this by ignoring agent reregistration requests while a > marking unreachable operation is in progress, similarly to how we do it > for marking gone. Once the marking operation is complete, the agent will > be able to reregister as usual. > > > Diffs > ----- > > src/master/master.cpp 164720a3ad40773b6de0268e3a7119de04bf297e > src/tests/master_tests.cpp cd0973ed4cc8fc33de714d59c7680aef05b97b47 > > > Diff: https://reviews.apache.org/r/73131/diff/1/ > > > Testing > ------- > > Ran `make check`. Verified that the new test crashes without the fix. > > > Thanks, > > Ilya Pronin > >
