-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54495/
-----------------------------------------------------------
Review request for mesos and Vinod Kone.
Bugs: MESOS-6676
https://issues.apache.org/jira/browse/MESOS-6676
Repository: mesos
Description
-------
In the following scenario:
* Master sees a re-registration attempt from a PID-based scheduler,
* The scheduler was previously registered with the master,
* and the "force" flag is not set
The master neglected to re-link with the scheduler. For example, this
might happen if:
* The master sees an ExitedEvent for the framework and marks it
disconnected.
* The master sends a FrameworkErrorMessage to the framework but this
message is dropped, e.g., due to a transient network failure.
* The scheduler attempts to re-register with the master, e.g., because
it detects (spuriously) that the current leading master has changed.
This is problematic, because it might leave the master -> scheduler
connection using an ephemeral socket.
Diffs
-----
src/master/master.cpp 67f32229470da4cf7953881d1c5dcb99393002de
Diff: https://reviews.apache.org/r/54495/diff/
Testing
-------
`make check`
Note that it would be _great_ to write a unit test for this situation (as well
as a class of related failure conditions), but the current testing
infrastructure doesn't make that easy.
Thanks,
Neil Conway