[ 
https://issues.apache.org/jira/browse/YUNIKORN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2766.
------------------------------------
    Fix Version/s: 1.6.0
       Resolution: Fixed

> Only generate event if all predicates failed
> --------------------------------------------
>
>                 Key: YUNIKORN-2766
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2766
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.6.0
>
>
> Right now, we send an event to the pod if a predicate failed:
> {noformat}
>                if err := plugin.Predicates(&si.PredicatesArgs{
>                       AllocationKey: allocationKey,
>                       NodeID:        sn.NodeID,
>                       Allocate:      allocate,
>               }); err != nil {
>                       log.Log(log.SchedNode).Debug("running predicates 
> failed",
>                               zap.String("allocationKey", allocationKey),
>                               zap.String("nodeID", sn.NodeID),
>                               zap.Bool("allocateFlag", allocate),
>                               zap.Error(err))
>                       // running predicates failed
>                       msg := err.Error()
>                       ask.LogAllocationFailure(msg, allocate)
>                       ask.SendPredicateFailedEvent(msg)
>                       return false
>               }
> {noformat}
> This is, however, not correct. We should only generate an event if *all* 
> predicates have failed, which means that the pod cannot be scheduled. A 
> failing predicate for a given node can be perfectly normal in many cases.
> Instead, we should aggregate the failed predicates and send an event like:
> {noformat}
> All predicates failed for request '345d70d7-243a-4077-a9f8-0bb76c3532d7': 
> node(s) didn't match Pod's node affinity/selector (20x); node(s) had taints 
> that the pod didn't tolerate (5x)
> {noformat}
> where 20x and 5x tell how many times a certain predicate failed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to