cf-natali commented on pull request #388:
URL: https://github.com/apache/mesos/pull/388#issuecomment-850602132


   > @ridv This, unfortunately, is not a "small patch", but an attempt to 
compensate for a deficiency in a workaround in a code that uses a rather 
problematic kernel interface and is already full of hacks and short-term 
solutions.
   
   Indeed, this freezer code is quite tricky, and with good reason - at work we 
experienced 3 different kernel bugs involving the freezer cgroup, it's really a 
mess even after all those years...
   
   > @cf-natali I'm trying to look into this and to make sure this is not 
making some other issues worse.
   > 
   > For example, does this all mean that this deferred call that freezes a 
cgroup
   > 
   > 
https://github.com/apache/mesos/blob/85981d7c728798783b2e1e7cbed4f27a1497ccb2/src/linux/cgroups.cpp#L1440
   > 
   > has never been executed before this patch (due to immediate termination of 
a TaskKiller process), but, after this change, will be executing in some 
cases?...
   
   As far as I can tell, this should not exercise any new code path, and should 
actually make unusual code paths less frequent.
   The code you're referring to is only involved when the freeze operation 
timed out, and attempts to thaw and freeze it again - this can happen in normal 
circumstances, i.e. in the normal case where a cgroup is destroyed. This 
specific code path was exercised in the past - I checked using logs and 
`strace`, and will continue to be exercised if the freeze times out.
   
   One way to view this change is that it can only delay the termination of the 
`TaskKiller` process: therefore, the `TaskKiller` will only terminate in a 
state in which it could already be terminated in in the current code.
   The extra delay is a grace period letting some time to the `TaskKiller` to 
finish its freeze/kill/thaw cycle, hence making it much less likely to 
terminate it while it's in the middle of its freeze/kill/thaw.
   
   > (Btw, inablilty to comment on an untouched line in a github PR is one of 
examples why, as of 2019-2020, Mesos committers were still preferring to review 
contribution using Apache Reviewboard. I'm not saying that we should not do 
something to simplify contributing via GitHub, just illustrating why the RB has 
not been phased out.)
   
   Interesting, thanks!
   
   > I'm sorry, but reviewing such stuff meaningfully takes significant chunks 
of an uninterrupted time and should only be done in a very sound mind and a 
very good health (this was not my case in the beginnig of this week).
   
   I'm really sorry to hear that, and hope you're feeling better!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to