cf-natali commented on pull request #388: URL: https://github.com/apache/mesos/pull/388#issuecomment-850602132
> @ridv This, unfortunately, is not a "small patch", but an attempt to compensate for a deficiency in a workaround in a code that uses a rather problematic kernel interface and is already full of hacks and short-term solutions. Indeed, this freezer code is quite tricky, and with good reason - at work we experienced 3 different kernel bugs involving the freezer cgroup, it's really a mess even after all those years... > @cf-natali I'm trying to look into this and to make sure this is not making some other issues worse. > > For example, does this all mean that this deferred call that freezes a cgroup > > https://github.com/apache/mesos/blob/85981d7c728798783b2e1e7cbed4f27a1497ccb2/src/linux/cgroups.cpp#L1440 > > has never been executed before this patch (due to immediate termination of a TaskKiller process), but, after this change, will be executing in some cases?... As far as I can tell, this should not exercise any new code path, and should actually make unusual code paths less frequent. The code you're referring to is only involved when the freeze operation timed out, and attempts to thaw and freeze it again - this can happen in normal circumstances, i.e. in the normal case where a cgroup is destroyed. This specific code path was exercised in the past - I checked using logs and `strace`, and will continue to be exercised if the freeze times out. One way to view this change is that it can only delay the termination of the `TaskKiller` process: therefore, the `TaskKiller` will only terminate in a state in which it could already be terminated in in the current code. The extra delay is a grace period letting some time to the `TaskKiller` to finish its freeze/kill/thaw cycle, hence making it much less likely to terminate it while it's in the middle of its freeze/kill/thaw. > (Btw, inablilty to comment on an untouched line in a github PR is one of examples why, as of 2019-2020, Mesos committers were still preferring to review contribution using Apache Reviewboard. I'm not saying that we should not do something to simplify contributing via GitHub, just illustrating why the RB has not been phased out.) Interesting, thanks! > I'm sorry, but reviewing such stuff meaningfully takes significant chunks of an uninterrupted time and should only be done in a very sound mind and a very good health (this was not my case in the beginnig of this week). I'm really sorry to hear that, and hope you're feeling better! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
