----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66232/#review199975 -----------------------------------------------------------
Ship it! Ship It! - Alexander Rukletsov On March 23, 2018, 5:49 p.m., Gaston Kleiman wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/66232/ > ----------------------------------------------------------- > > (Updated March 23, 2018, 5:49 p.m.) > > > Review request for mesos, Alexander Rukletsov, Joseph Wu, Qian Zhang, and > Vinod Kone. > > > Bugs: MESOS-8530 > https://issues.apache.org/jira/browse/MESOS-8530 > > > Repository: mesos > > > Description > ------- > > The default executor uses checks to ensure that it is subscribed to the > agent before killing a task/container; if it is not, it will skip kill > calls or crash. > > When killing a task/container, the default executor performs the > following actions: > > 1) Calls `KILL_NESTED_CONTAINER`. > 2) Enqueues a task status update (for tasks only). > > None of these actions require an active subscription. > > When the checks were added: > > 1) The default executor supported launching only one task group, so it > was correct to assume that killing a task would eventually trigger the > destruction of all the child containers. > 2) It was assumed that the executor will only be unsubscribed for brief > periods of time, mostly during agent recovery, when kill calls are > likely to fail. > 3) There was no kill/escalation retry logic. > > The checks were added as a way of working around the lack of retry logic > for kill requests, relying on the fact that crashing the executor leads > to the destruction of all the child containers. > > This chain adds retry logic for failed kill call escalation, making the > workaround unnecessary. > > Now that the default executor supports running multiple task groups, the > checks are not just unnecessary, but also invalid and dangerous. If a > check fails, all the containers started by the executor will be killed, > regardless of which task group they belong to. This is bad and could > lead to data loss. > > This patch removes these unnecessary and sometimes invalid checks from > the default executor. > > > Diffs > ----- > > src/launcher/default_executor.cpp 906836f3b8e0af79d7c61f90fd8a95f193b26e84 > > > Diff: https://reviews.apache.org/r/66232/diff/2/ > > > Testing > ------- > > `sudo bin/mesos-tests.sh` on GNU/Linux > > > Thanks, > > Gaston Kleiman > >
