[ 
https://issues.apache.org/jira/browse/YUNIKORN-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18035035#comment-18035035
 ] 

Peter Bacsko commented on YUNIKORN-3148:
----------------------------------------

Based on my analysis so far, what happens to {{userHeadroom}} is not a concern, 
at least when it comes to modifications. We got a copy from 
{{Manager.Headroom()}} anyway and this is new for each application.

However, we do need to think about whether this is correct or not. Calling code:
{noformat}
func (sq *Queue) GetQueueOutstandingRequests(total *[]*Allocation) {
    if sq.IsLeafQueue() {
       headRoom := sq.getMaxHeadRoom()
       // while calculating outstanding requests, we calculate all the requests 
that can fit into the queue's headroom,
       // all these requests are qualified to trigger the up scaling.
       for _, app := range sq.sortApplications(false) {
          // calculate the users' headroom
          userHeadroom := ugm.GetUserManager().Headroom(app.queuePath, 
app.ApplicationID, app.user)  <-- always new object, even if user is the same
          app.getOutstandingRequests(headRoom, userHeadroom, total)
       }{noformat}
I do think we need to save the {{userHeadroom}} object for every user in a map 
and then re-use it when needed:
{noformat}
        userHeadrooms := make(map[string]*resources.Resource)
        for _, app := range sq.sortApplications(false) {
          userHeadroom := userHeadrooms[app.user]
          if userHeadroom  == nil {
            uhr := ugm.GetUserManager().Headroom(app.queuePath, 
app.ApplicationID, app.user)
            userHeadrooms[app.user] = uhr
            userHeadroom = uhr
          }

          app.getOutstandingRequests(headRoom, userHeadroom, total)
        }
 {noformat}

> Incorrect headroom calculation when collecting outstanding requests
> -------------------------------------------------------------------
>
>                 Key: YUNIKORN-3148
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-3148
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: core - scheduler
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Critical
>
>  YUNIKORN-2794 introduced a bug in the Application code. We're no longer 
> mutating the {{headRoom}} object in-place in 
> {{{}Application.getOutstandingRequests(){}}}, instead, we re-assign its value 
> to a newly created object.
> Before:
> {noformat}
> headRoom.SubOnlyExisting(request.GetAllocatedResource())
> userHeadRoom.SubOnlyExisting(request.GetAllocatedResource()) {noformat}
> After:
> {noformat}
> headRoom = resources.SubOnlyExisting(headRoom, request.GetAllocatedResource())
> userHeadRoom = resources.SubOnlyExisting(userHeadRoom, 
> request.GetAllocatedResource())  {noformat}
> Problem is, this does not change the object pointed to outside the function, 
> so every iteration in {{Queue.GetQueueOutstandingRequests()}} starts with the 
> original {{headRoom}} value for every application. This leads to undesired 
> behavior, because we'll end up collecting more asks than needed, which in 
> turn triggers unnecessary cluster upscale.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to