----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/38003/#review99112 -----------------------------------------------------------
src/tests/master_tests.cpp (line 3596) <https://reviews.apache.org/r/38003/#comment155976> new line. src/tests/master_tests.cpp (lines 3597 - 3598) <https://reviews.apache.org/r/38003/#comment155979> // This test ensures that a slave gets a unique SlaveID even after // master fails over. Please refer to MESOS-3351 for further details. src/tests/master_tests.cpp (lines 3607 - 3608) <https://reviews.apache.org/r/38003/#comment155982> Why specify a mock executor and test containerizer? There's a StartSlave() overload that takes just the detector (and optionally flags), which you can use? src/tests/master_tests.cpp (line 3622) <https://reviews.apache.org/r/38003/#comment155985> // Start a new slave and make sure it registers before the old slave. src/tests/master_tests.cpp (line 3630) <https://reviews.apache.org/r/38003/#comment155986> // Now let the first slave re-register. src/tests/master_tests.cpp (lines 3633 - 3634) <https://reviews.apache.org/r/38003/#comment155988> // If both the slaves get the same SlaveID, the re-registration would fail here. src/tests/master_tests.cpp (line 3636) <https://reviews.apache.org/r/38003/#comment155989> Also add a CHECK_NE() check with both the slave ids? src/tests/master_tests.cpp (line 3637) <https://reviews.apache.org/r/38003/#comment155990> Does this test reliably fail (i.e., every time) without the code change in master.cpp? - Vinod Kone On Sept. 14, 2015, 6:08 p.m., Klaus Ma wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/38003/ > ----------------------------------------------------------- > > (Updated Sept. 14, 2015, 6:08 p.m.) > > > Review request for mesos, Ben Mahler, Jie Yu, and Vinod Kone. > > > Bugs: MESOS-3351 > https://issues.apache.org/jira/browse/MESOS-3351 > > > Repository: mesos > > > Description > ------- > > __Phenomenon:__ > In some race condition, the slave was shutdown when after master failover. > > __Root Cause:__ > The slave was shutdown because of duplicated SlavID: in master, the SlaveID > is genereated by masterInfo.id + "-S" + nextSlaveId; when master failover, > nextSlaveId was reset to 0 and masterInfo.id (generated by date + ip + port + > pid) maybe un-changed which lead to duplicated SlaveID. > > __Solution/Fix:__ > Generate masterInfo.id by UUID instead of "date + ip + port + pid". > > > Diffs > ----- > > src/master/master.cpp 5589eca > src/tests/master_tests.cpp 8a6b98b > > Diff: https://reviews.apache.org/r/38003/diff/ > > > Testing > ------- > > make > make check > > > Thanks, > > Klaus Ma > >
