Wilfred Spiegelenburg created YUNIKORN-2737:
-----------------------------------------------
Summary: Cleanup handleFailApplicationEvent handling
Key: YUNIKORN-2737
URL: https://issues.apache.org/jira/browse/YUNIKORN-2737
Project: Apache YuniKorn
Issue Type: Improvement
Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg
When we handle a failed application in the shim in
{{handleFailApplicationEvent()}} we call the placeholder cleanup.
Three issues:
* The cleanup needs the app lock after it takes the mgr lock. The app lock is
already held when we process the event. Should be placing the cleanup last to
not hold the manager lock for longer than needed
* failing an application is triggered by the core which should do the cleanup
already so this might be redundant to start with.
* The failure handling also marks unassigned pods as failed which means there
is an overlap between the failure handling and the placeholder cleanup which we
should remove. Either ignore all placeholders in the failure or only cleanup
assigned placeholders.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]