> On Feb. 15, 2018, 11:55 p.m., Greg Mann wrote:
> > src/master/master.cpp
> > Lines 7647-7654 (patched)
> > <https://reviews.apache.org/r/65482/diff/3/?file=1961521#file1961521line7647>
> >
> >     I'm sitting here trying to think of ways we might avoid crashing if the 
> > framework subscribes before the operation becomes terminal...
> >     
> >     Would it be reasonable to add an `if (framework == nullptr)` check to 
> > `updateOperation()` so that we only recover resources if the framework is 
> > known to the master?

Er... wait that doesn't make sense :) I guess when we receive the operation 
update, we have no way of knowing whether or not the framework had subscribed 
when the master learned about the pending operation. As a workaround for now, 
we could store in a set the operation UUIDs of operations for which we do not 
track allocated resources (i.e., operations which hit this block of code). 
Then, in `updateOperation` we could avoid recovering resources if the 
operation's UUID is in the set?


- Greg


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65482/#review197641
-----------------------------------------------------------


On Feb. 14, 2018, 2:21 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65482/
> -----------------------------------------------------------
> 
> (Updated Feb. 14, 2018, 2:21 p.m.)
> 
> 
> Review request for mesos, Greg Mann, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8536
>     https://issues.apache.org/jira/browse/MESOS-8536
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch fixes the handling of non-terminal operations learned by a
> newly elected master after a master failover, so that only these
> operations are counted as using resources. Previously we did not count
> any operations as using resources which by accident produced expected
> behavior if the operation was already terminal when the master learned
> about them.
> 
> We do not address the issue of being unable to properly account for
> operations triggered by frameworks unknown to the master, see
> MESOS-8582. Instead we emit a warning for now since the master might
> continue to abort due to assertion failures due to incomplete resource
> accounting.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp b06d7a6e2fbbb81b97eaf537d5b6745c73dc867d 
> 
> 
> Diff: https://reviews.apache.org/r/65482/diff/3/
> 
> 
> Testing
> -------
> 
> `make check`, also tested with a version of the test added in r/65045 which 
> triggered this issue.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>

Reply via email to