Re: Review Request 59746: Stopped accounting aborted container launches as failures.

Greg Mann Wed, 27 Sep 2017 15:35:51 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59746/#review186459
-----------------------------------------------------------





src/slave/slave.cpp
Lines 5230-5232 (patched)
<https://reviews.apache.org/r/59746/#comment263073>

    It looks to me like the significant functional difference in this patch (in 
addition to the metrics change) is that we will no longer call 
`containerizer->destroy(containerId)` when the executor state is TERMINATING.
    
    After looking at the code, it seems to me that in all cases where the 
executor state is TERMINATING, we have either already called 
`containerizer->destroy`, or we have registered a callback which will do so. 
So, this seems safe to me, but I wanted to write this comment to see if our 
understandings of the change and its impact are correct.



src/slave/slave.cpp
Lines 5225-5229 (original), 5233-5237 (patched)
<https://reviews.apache.org/r/59746/#comment263011>

    What do you think about moving this log line up into the `if 
(!future.isReady())` block? In the case where `executor == nullptr`, the logs 
would not tell us if the future wasn't ready, which seems potentially useful 
when debugging.



src/slave/slave.cpp
Lines 5250-5253 (patched)
<https://reviews.apache.org/r/59746/#comment263072>

    This comment is a bit confusing to me. I'm not sure if this captures the 
precise meaning you want, but here's a suggestion:
    
    "Executor's `pendingTermination` may have already been set by a callback 
further down the chain. However, we are likely to have more context for this 
failure here, thus we overwrite the termination to ensure that the state, 
reason, and message are set correctly."


- Greg Mann


On Sept. 27, 2017, 4:47 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59746/
> -----------------------------------------------------------
> 
> (Updated Sept. 27, 2017, 4:47 p.m.)
> 
> 
> Review request for mesos, Greg Mann, Ian Downes, Jie Yu, Joseph Wu, Jan 
> Schlicht, and Vinod Kone.
> 
> 
> Bugs: MESOS-7601
>     https://issues.apache.org/jira/browse/MESOS-7601
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The container launch future might be failed or discarded (depending
> on the containerizer implementation) if the launch has been aborted,
> for example, a framework might have stopped while its task are being
> started. Such failures should not be accounted as launch errors.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.cpp 50d2a10cd68f6611efd4e691e5325e6e0c06f33a 
> 
> 
> Diff: https://reviews.apache.org/r/59746/diff/3/
> 
> 
> Testing
> -------
> 
> `make check` on several Linux distros.
> 
> Additional manual tests for (1) mesos and (1) docker containerizers. The 
> framework is asked to exit right after it submits the task to mesos.
> 
> (1) With mesos c-zer
> m: `./bin/mesos-master.sh --work_dir=./m`
> a: `GLOG_v=1 sudo ./bin/mesos-agent.sh --master=<ip>:5050 --work_dir=./a 
> --containerizers=mesos --image_providers="DOCKER" 
> --isolation=filesystem/linux,docker/runtime`
> f: `./src/mesos-execute --master=<ip>:5050 --containerizer=mesos 
> --docker_image=fedora:25 --name=pull-test --command="sleep 1000"`
> 
> (2) With docker c-zer
> m: `./bin/mesos-master.sh --work_dir=./m`
> a: `GLOG_v=1 sudo ./bin/mesos-agent.sh --master=<ip>:5050 --work_dir=./a 
> --containerizers=docker`
> f: `./src/mesos-execute --master=<ip>:5050 --containerizer=docker 
> --docker_image=fedora:25 --name=pull-test --command="sleep 1000"`
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Re: Review Request 59746: Stopped accounting aborted container launches as failures.

Reply via email to