-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72831/
-----------------------------------------------------------
(Updated Sept. 8, 2020, 11:48 p.m.)
Review request for mesos and Greg Mann.
Bugs: MESOS-9609
https://issues.apache.org/jira/browse/MESOS-9609
Repository: mesos
Description
-------
Per MESOS-9609, it's possible for the master to encounter a CHECK
failure during agent removal in the following situation:
1. Given a framework with checkpoint == false, with only
executor(s) (no tasks) running on an agent:
2. When this agent disconects from the master,
Master::removeFramework(Slave*, Framework*) removes the
tasks and executors. However, when there are no tasks, this
function will accidentally insert an entry into
Master::Slave::tasks! (Due to the [] operator usage)
3. Now if the framework is removed, we have an entry in
Slave::tasks, for which there is no corresponding framework.
4. When the agent is removed, we have a CHECK failure given
we can't find the framework.
This fixes the issue by avoiding the accidental insertion.
Diffs
-----
src/master/master.cpp 02723296e569fac9d553b1494a5ca7daa6ef9aa4
Diff: https://reviews.apache.org/r/72831/diff/1/
Testing
-------
See subsequent patch.
Thanks,
Benjamin Mahler