[
https://issues.apache.org/jira/browse/YUNIKORN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Bacsko resolved YUNIKORN-2766.
------------------------------------
Fix Version/s: 1.6.0
Resolution: Fixed
> Only generate event if all predicates failed
> --------------------------------------------
>
> Key: YUNIKORN-2766
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2766
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Reporter: Peter Bacsko
> Assignee: Peter Bacsko
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Right now, we send an event to the pod if a predicate failed:
> {noformat}
> if err := plugin.Predicates(&si.PredicatesArgs{
> AllocationKey: allocationKey,
> NodeID: sn.NodeID,
> Allocate: allocate,
> }); err != nil {
> log.Log(log.SchedNode).Debug("running predicates
> failed",
> zap.String("allocationKey", allocationKey),
> zap.String("nodeID", sn.NodeID),
> zap.Bool("allocateFlag", allocate),
> zap.Error(err))
> // running predicates failed
> msg := err.Error()
> ask.LogAllocationFailure(msg, allocate)
> ask.SendPredicateFailedEvent(msg)
> return false
> }
> {noformat}
> This is, however, not correct. We should only generate an event if *all*
> predicates have failed, which means that the pod cannot be scheduled. A
> failing predicate for a given node can be perfectly normal in many cases.
> Instead, we should aggregate the failed predicates and send an event like:
> {noformat}
> All predicates failed for request '345d70d7-243a-4077-a9f8-0bb76c3532d7':
> node(s) didn't match Pod's node affinity/selector (20x); node(s) had taints
> that the pod didn't tolerate (5x)
> {noformat}
> where 20x and 5x tell how many times a certain predicate failed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]