> On Jan. 30, 2020, 12:14 a.m., Greg Mann wrote:
> > The patch looks great, thanks Andrei. What about adding a test for this, 
> > would it be hard? I'm imagining something like:
> > 1) kill a task under the default executor
> > 2) intercept the ACK from agent to executor
> > 3) verify that the executor is still running
> > 4) send the ACK to the executor
> > 5) verify that the executor has terminated
> > 
> > WDYT?
> 
> Andrei Budnik wrote:
>     How to implement step (2) and step (4)? Is there an example somewhere in 
> Mesos tests?

Yep there are some places in the tests where we use `DROP_PROTOBUF` to 
intercept a message and then inject it manually with `process::post`; see 
'TaskStatusUpdateManagerTest.DuplicateUpdateBeforeAck', for example.


- Greg


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72029/#review219426
-----------------------------------------------------------


On Jan. 30, 2020, 3:28 p.m., Andrei Budnik wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72029/
> -----------------------------------------------------------
> 
> (Updated Jan. 30, 2020, 3:28 p.m.)
> 
> 
> Review request for mesos, Andrei Sekretenko, Greg Mann, Qian Zhang, and Vinod 
> Kone.
> 
> 
> Bugs: MESOS-8537
>     https://issues.apache.org/jira/browse/MESOS-8537
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, the default executor terminated itself after all containers
> had terminated. This could lead to termination of the executor before
> processing of a terminal status update by the agent. In order
> to mitigate this issue, the executor slept for one second to give a
> chance to send all status updates and receive all status update
> acknowledgements before terminating itself. This might have led to
> various race conditions in some circumstances (e.g., on a slow host).
> This patch terminates the default executor if all status updates have
> been acknowledged by the agent and no running containers left.
> Also, this patch increases the timeout from one second to one minute
> for fail-safety.
> 
> 
> Diffs
> -----
> 
>   src/launcher/default_executor.cpp 4369fd0052b2e8496ba63606fa57e17d881ea52c 
> 
> 
> Diff: https://reviews.apache.org/r/72029/diff/4/
> 
> 
> Testing
> -------
> 
> internal CI
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>

Reply via email to