> On Dec. 16, 2016, 7:04 p.m., Greg Mann wrote: > > src/tests/slave_tests.cpp, lines 2733-2736 > > <https://reviews.apache.org/r/54803/diff/1/?file=1587155#file1587155line2733> > > > > In some cases you explicitly advance the clock by the backoff factor, > > while here you resume the clock instead. What do you think about > > consistently using `Clock::advance(flags.registration_backoff_factor);` in > > order to establish a common pattern?
I personally think it's a really great idea. I didn't do it here because I wasn't sure how long to advance specifically, and in particular, it's not clear to me whether we will possibly wait a multiple of `registration_backoff_factor`. What would you use in the call to `Clock::advance` here? - Alex ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/54803/#review159462 ----------------------------------------------------------- On Dec. 16, 2016, 2:47 a.m., Alex Clemmer wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/54803/ > ----------------------------------------------------------- > > (Updated Dec. 16, 2016, 2:47 a.m.) > > > Review request for mesos, Adam B, Andrew Schwartzmeyer, Daniel Pravat, Greg > Mann, John Kordich, Joseph Wu, and Vinod Kone. > > > Bugs: MESOS-6803 > https://issues.apache.org/jira/browse/MESOS-6803 > > > Repository: mesos > > > Description > ------- > > Currently, when `HAS_AUTHENTICATION` is undefined, the Agent will > use `delay` to schedule a random time in the future to register with the > Master, to avoid the thundering herd problem after a Master failover. > The authentication codepath, in contrast, schedules the registration > immediately. > > In tests where we have `Clock::pause`'d when we are supposed to be > registering the slave, the authention codepath will succeeed, while > no-authentication codepath will hang forever. > > A much more detailed analysis of this situation exists in MESOS-6803. > > This commit will resolve this issue for `slave_tests.cpp` by changing > the tests to not use `Clock::pause` when we are waiting for Agent > registration. > > > Diffs > ----- > > src/tests/slave_tests.cpp fc6b56c074c71b827a9ee522cd715c0d15ecc7e3 > > Diff: https://reviews.apache.org/r/54803/diff/ > > > Testing > ------- > > Added `delay` to the call to `authenticate` in `Slave::detected`, ran tests > to find failing tests in `SlaveTest.*`, then fixed, then ran again. > > > Thanks, > > Alex Clemmer > >
