[
https://issues.apache.org/jira/browse/YUNIKORN-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18035035#comment-18035035
]
Peter Bacsko edited comment on YUNIKORN-3148 at 11/3/25 2:43 PM:
-----------------------------------------------------------------
Based on my analysis so far, what happens to {{userHeadroom}} is not a concern,
at least when it comes to modifications. We receive a copy from
{{Manager.Headroom()}} anyway and this is new for each application.
However, we do need to think about whether this is correct or not. Calling code:
{noformat}
func (sq *Queue) GetQueueOutstandingRequests(total *[]*Allocation) {
if sq.IsLeafQueue() {
headRoom := sq.getMaxHeadRoom()
// while calculating outstanding requests, we calculate all the requests
that can fit into the queue's headroom,
// all these requests are qualified to trigger the up scaling.
for _, app := range sq.sortApplications(false) {
// calculate the users' headroom
userHeadroom := ugm.GetUserManager().Headroom(app.queuePath,
app.ApplicationID, app.user) <-- always new object, even if user is the same
app.getOutstandingRequests(headRoom, userHeadroom, total)
}{noformat}
I do think we need to save the {{userHeadroom}} object for every user in a map
and then re-use it when needed:
{noformat}
userHeadrooms := make(map[string]*resources.Resource)
for _, app := range sq.sortApplications(false) {
userHeadroom := userHeadrooms[app.user]
if userHeadroom == nil {
uhr := ugm.GetUserManager().Headroom(app.queuePath,
app.ApplicationID, app.user)
userHeadrooms[app.user] = uhr
userHeadroom = uhr
}
app.getOutstandingRequests(headRoom, userHeadroom, total)
}
{noformat}
was (Author: pbacsko):
Based on my analysis so far, what happens to {{userHeadroom}} is not a concern,
at least when it comes to modifications. We got a copy from
{{Manager.Headroom()}} anyway and this is new for each application.
However, we do need to think about whether this is correct or not. Calling code:
{noformat}
func (sq *Queue) GetQueueOutstandingRequests(total *[]*Allocation) {
if sq.IsLeafQueue() {
headRoom := sq.getMaxHeadRoom()
// while calculating outstanding requests, we calculate all the requests
that can fit into the queue's headroom,
// all these requests are qualified to trigger the up scaling.
for _, app := range sq.sortApplications(false) {
// calculate the users' headroom
userHeadroom := ugm.GetUserManager().Headroom(app.queuePath,
app.ApplicationID, app.user) <-- always new object, even if user is the same
app.getOutstandingRequests(headRoom, userHeadroom, total)
}{noformat}
I do think we need to save the {{userHeadroom}} object for every user in a map
and then re-use it when needed:
{noformat}
userHeadrooms := make(map[string]*resources.Resource)
for _, app := range sq.sortApplications(false) {
userHeadroom := userHeadrooms[app.user]
if userHeadroom == nil {
uhr := ugm.GetUserManager().Headroom(app.queuePath,
app.ApplicationID, app.user)
userHeadrooms[app.user] = uhr
userHeadroom = uhr
}
app.getOutstandingRequests(headRoom, userHeadroom, total)
}
{noformat}
> Incorrect headroom calculation when collecting outstanding requests
> -------------------------------------------------------------------
>
> Key: YUNIKORN-3148
> URL: https://issues.apache.org/jira/browse/YUNIKORN-3148
> Project: Apache YuniKorn
> Issue Type: Improvement
> Components: core - scheduler
> Reporter: Peter Bacsko
> Assignee: Peter Bacsko
> Priority: Blocker
>
> YUNIKORN-2794 introduced a bug in the Application code. We're no longer
> mutating the {{headRoom}} object in-place in
> {{{}Application.getOutstandingRequests(){}}}, instead, we re-assign its value
> to a newly created object.
> Before:
> {noformat}
> headRoom.SubOnlyExisting(request.GetAllocatedResource())
> userHeadRoom.SubOnlyExisting(request.GetAllocatedResource()) {noformat}
> After:
> {noformat}
> headRoom = resources.SubOnlyExisting(headRoom, request.GetAllocatedResource())
> userHeadRoom = resources.SubOnlyExisting(userHeadRoom,
> request.GetAllocatedResource()) {noformat}
> Problem is, this does not change the object pointed to outside the function,
> so every iteration in {{Queue.GetQueueOutstandingRequests()}} starts with the
> original {{headRoom}} value for every application. This leads to undesired
> behavior, because we'll end up collecting more asks than needed, which in
> turn triggers unnecessary cluster upscale.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]