[ https://issues.apache.org/jira/browse/YUNIKORN-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Craig Condit updated YUNIKORN-3058: ----------------------------------- Target Version: 1.8.0 (was: 1.7.0) > [UMBRELLA] Support InPlacePodVerticalScaling (phase 3) > ------------------------------------------------------ > > Key: YUNIKORN-3058 > URL: https://issues.apache.org/jira/browse/YUNIKORN-3058 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler, shim - kubernetes > Reporter: Craig Condit > Assignee: Craig Condit > Priority: Major > > Kubernetes 1.27 added a new [InPlacePodVerticalScaling|http://example.com/] > feature. While this is currently still in an alpha state as of 1.32 (and > therefore requires a feature flag to enable), it will be moved to beta in > 1.33, meaning it will be enabled by default, and considered stable in an > upcoming release. The implementation of this feature has implications for > YuniKorn, as with the feature enabled, the requests and limits of a Pod are > no longer immutable. > Fortunately, the updated API objects that enable the feature contain the new > fields so we can add initial support for the feature now. To enable the > feature for testing in a Kind cluster, the kind cluster configuration needs > to contain the following: > {noformat} > kind: Cluster > apiVersion: kind.x-k8s.io/v1alpha4 > featureGates: > "InPlacePodVerticalScaling": true{noformat} > During scheduling of new pods, the requested resources are still used as > before. > However, once a pod has been started, the actual resource utilization needs > to be tracked via a new {{Pod.Status.ContainerStatuses[].AllocatedResources}} > collection. In addition, if the value of {{Pod.Status.Resize}} is set to > {{{}Proposed{}}}, the usage of each container needs to be computed as the > maximum of its requested and allocated resources. The requested resources > field becomes mutable once this feature is turned on, and it represents the > latest *requested* (not actual) usage of the container. > Supporting this feature is not optional within YuniKorn, as failure to > process the updated resources will mean that we do not account for resource > usage correctly if a pod is updated. > Several steps will need to be taken to support this properly: > * Ensure that GetPodResources() accurately computes the effective usage of > the Pod in all cases. Since the AllocatedResources field will not be > populated when this feature is not active, and is only set once the pod is in > a running statue, this is fairly straightforward and can be implemented even > in clusters which do not have this feature enabled. > * The result of GetPodResources() will need to be cached in the shim so that > we can detect resource changes on Pod updates. Comparing the result of > GetPodResources() on the new Pod vs. the existing version will allow us to > easily detect changes. > * If changes are detected to a running YuniKorn-managed pod, an update > message will need to be sent from the core to change the resources of the > allocated task. > * If changes are detected to a running non-Yunikorn-managed pod, and update > of the node utilized resources will need to be sent from the shim to the core. > * The core *must not* reject these updates, even if they would cause a queue > to go over capacity. Instead, they must be applied to the appropriate ask or > allocation unconditionally. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org