GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/310

    [FLINK-1376] [runtime] Add proper shared slot release in case of a fatal 
TaskManager failure

    This PR introduces SharedSlots as being a special Slot type and as such 
being released properly in case an Instance has been marked dead. This fixes 
the problem that a dead instance, which has not been shutdown properly, causes 
a job not being removed properly from the system, because it is not aware of 
the SubSlots.
    
    Adds test cases where only the heartbeat thread of TaskManager is killed.
    
    Except for the test cases, this is basically the same PR as #309 just 
rebased on the current 0.8 release candidate.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink fixSharedSlotReleaseRC2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/310.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #310
    
----
commit 2935a7ee19eddb48efb38a3a65c4afe5e1bba0d2
Author: Till Rohrmann <trohrm...@apache.org>
Date:   2015-01-12T09:58:45Z

    [FLINK-1376] [runtime] Add proper shared slot release in case of a fatal 
TaskManager failure.
    
    Fixes concurrent modification exception of SharedSlot's subSlots field by 
synchronizing all state changing operations through the associated assignment 
group. Fixes deadlock where Instance.markDead first acquires InstanceLock and 
then by releasing the associated slots the assignment group lockcan block with 
a direct releaseSlot call on a SharedSlot which first acquires the assignment 
group lock and then the instance lock in order to return the slot to the 
instance.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to