cf-natali commented on a change in pull request #388:
URL: https://github.com/apache/mesos/pull/388#discussion_r638138998
##########
File path: src/linux/cgroups.cpp
##########
@@ -1403,9 +1403,15 @@ class TasksKiller : public Process<TasksKiller>
protected:
void initialize() override
{
- // Stop when no one cares.
- promise.future().onDiscard(lambda::bind(
- static_cast<void (*)(const UPID&, bool)>(terminate), self(), true));
+ // We don't want to stop immediately upon discard, because
+ // it could leave the cgroup frozen which means that processes
+ // are stuck in uninterrutible sleep (D state), which is quite bad.
+ // So upon discard we still do our best and keep trying to
+ // kill the cgroup for up to FREEZE_RETRY_INTERVAL which should be
+ // a reasonable upper bound.
+ promise.future().onDiscard([this]() {
+ delay(FREEZE_RETRY_INTERVAL, self(), &Self::selfTerminate);
Review comment:
Hey @kamaradclimber , thanks for reviewing :).
It's just a delay for the discard: the killer process will continue running,
which means that it will do its usual freeze/kill/thaw, and therefore will stop
as soon as the tasks have been killed, it won't wait any longer than necessary.
Since usually a cgroup can be frozen and killed very quickly - few ms - it
should have negligible impact.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]