cf-natali opened a new pull request, #433:
URL: https://github.com/apache/mesos/pull/433

   This test would randomly fail with:
   ```
   18:16:59 3: F0501 17:16:59.192818 19175 slave.cpp:1445] Check failed:
      state == DISCONNECTED || state == RUNNING || state == TERMINATING
   RECOVERING
   ```
   
   The cause was that the test re-starts the slave with the same PID, which
   means that timers started by the previous slave process could fire while
   the new slave process was running.
   
   In this specific case, what happened is that the previous slave's ping
   timer would fire in the middle of recovery of the second slave instance,
   yielding this assertion.
   
   Fixed by making sure to use `Clock::advance` and `Clock::settle` after
   terminating the first instance to ensure that there are no pending
   timers.
   
   Tested by running the test in a loop, while running a CPU-intensive
   workload - `stress-ng --cpu $(nproc)0` in parallel.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to