[jira] [Commented] (YUNIKORN-2895) Don't add duplicated allocation to node when the allocation ask fails

Wilfred Spiegelenburg (Jira) Sat, 19 Oct 2024 00:53:15 -0700


    [ 
https://issues.apache.org/jira/browse/YUNIKORN-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17891113#comment-17891113
 ]


Wilfred Spiegelenburg commented on YUNIKORN-2895:
-------------------------------------------------

[~osev] :

TL;DR  I think we need a new Jira for OutOfCpu etc as it is a special case that 
we need to handle and we need a fresh full set of logs and dump for it.
{quote}When we create a lot of pods at once we sometimes get OutOfCpu errors 
where it looks like the pods are being scheduled on machines where they don't 
fit.
{quote}
That is a known issue with GKE and is caused by GKE. Even the default scheduler 
has this issue.

GKE uses static pods for the kube proxy on the node. Cloud providers like AWS 
and Azure use daemon sets for this. The static pods cause the issue as it is 
not really a pod in the API server. The node creates a "mirror pod" after 
startup. That mirror pod is used to make sure the resource usage of the node is 
accounted for correctly. In certain cases the node is too slow creating this 
mirror pod but marks itself ready for scheduling. That means that everything in 
the cluster, like the scheduler and autoscaler, think the node has more space 
than it really has.

The node admission phase does account for the usage of static pods. That causes 
this "OutOf..." state for pods. The k8shim should be told about the pod failing 
to be admitted to the node via the informer. We should be able to remove the 
pod at that point. That does not happen or we do not handle the pod status 
correctly when that happens. The offending pod is not in the json file attached 
as part of the k8shim diagnostics. I only see it on the core side.

See all the details in 
[https://github.com/kubernetes/kubernetes/issues/115325.] K8s does not really 
have a good idea on how to solve this yet. The problem has been known for at 
least 18 months...

> Don't add duplicated allocation to node when the allocation ask fails
> ---------------------------------------------------------------------
>
>                 Key: YUNIKORN-2895
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2895
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>            Reporter: Qi Zhu
>            Assignee: Qi Zhu
>            Priority: Critical
>              Labels: pull-request-available
>         Attachments: orphaned_dataops_1.6_patched.json
>
>
> When i try to revisit the new update allocation logic, the potential 
> duplicated allocation to node will happen if the allocation already 
> allocated.  And we try to add the allocation to the node again and don't 
> revert it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YUNIKORN-2895) Don't add duplicated allocation to node when the allocation ask fails

Reply via email to