----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/73131/#review222447 -----------------------------------------------------------
Ship it! src/tests/master_tests.cpp Lines 11235-11239 (patched) <https://reviews.apache.org/r/73131/#comment311501> Maybe a TODO that we can use the in-memory registry here if we made it injectable in StartMaster? (The benefit being that the tests run faster with the in-memory one). src/tests/master_tests.cpp Lines 11247 (patched) <https://reviews.apache.org/r/73131/#comment311502> We can just avoid this variable and passing it in to StartSlave since we're using the default flags? src/tests/master_tests.cpp Lines 11268 (patched) <https://reviews.apache.org/r/73131/#comment311503> Don't need to settle here since we're just waiting for mark after. - Benjamin Mahler On Jan. 12, 2021, 1:23 a.m., Ilya Pronin wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/73131/ > ----------------------------------------------------------- > > (Updated Jan. 12, 2021, 1:23 a.m.) > > > Review request for mesos and Benjamin Mahler. > > > Bugs: MESOS-10209 > https://issues.apache.org/jira/browse/MESOS-10209 > > > Repository: mesos > > > Description > ------- > > During master failover if agent reregistration runs concurrently with > marking the agent as unreachable and finishes before the MarkUnreachable > operation is complete, the assertion that the agent is in the recovered > set in Master::_markUnreachable() doesn't hold. The reason for this is > because after readmitting the agent the master removes it from the > recovered set in Master::__reregisterSlave(). > > We can fix this by ignoring agent reregistration requests while a > marking unreachable operation is in progress, similarly to how we do it > for marking gone. Once the marking operation is complete, the agent will > be able to reregister as usual. > > > Diffs > ----- > > src/master/master.cpp 164720a3ad40773b6de0268e3a7119de04bf297e > src/tests/master_tests.cpp cd0973ed4cc8fc33de714d59c7680aef05b97b47 > > > Diff: https://reviews.apache.org/r/73131/diff/1/ > > > Testing > ------- > > Ran `make check`. Verified that the new test crashes without the fix. > > > Thanks, > > Ilya Pronin > >
