Volodymyr Kot created YUNIKORN-2735:
---------------------------------------
Summary: YuniKorn doesn't schedule correctly after some pods were
marked as Unschedulable
Key: YUNIKORN-2735
URL: https://issues.apache.org/jira/browse/YUNIKORN-2735
Project: Apache YuniKorn
Issue Type: Bug
Reporter: Volodymyr Kot
Attachments: bug-logs, driver.yml, executor.yml, nodestate, podstate
It is a bit of an edge case, but I can consistently reproduce this on master -
see steps and comments used below:
# Create a new cluster with kind, with 4 cpus/8Gb of memory
# Deploy YuniKorn using helm
# Set up service account for Spark
## "kubectl create serviceaccount spark"
## "kubectl create clusterrolebinding spark-role --clusterrole=edit
--serviceaccount=default:spark --namespace=default"
# Run kubectl proxy" to be able to run spark-submit
# Create Spark application* 1 with driver and 2 executors - fits fully,
placeholders are created and replaced
# Create Spark application 2 with driver and 2 executors - only one executor
placeholder is scheduled, rest of the pods are marked Unschedulable
# Delete one of the executors from application 1
# Spark driver re-creates the executor, it is marked as unschedulable
At that point scheduler is "stuck", and won't schedule either executor from
application 1 OR placeholder for executor from application 2 - it deems both of
those unschedulable. See logs below, and please let me know if I misunderstood
something/it is expected behavior!
*Script used to run spark-submit:
{code:java}
${SPARK_HOME}/bin/spark-submit --master k8s://http://localhost:8001
--deploy-mode cluster --name spark-pi \
--master k8s://http://localhost:8001 --deploy-mode cluster --name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.executor.request.cores=0.5 \
--conf spark.kubernetes.container.image=docker.io/apache/spark:v3.4.0 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf
spark.kubernetes.driver.podTemplateFile=/Volumes/git/future/driver.yml \
--conf
spark.kubernetes.executor.podTemplateFile=/Volumes/git/future/executor.yml \
local:///opt/spark/examples/jars/spark-examples_2.12-3.4.0.jar 30000 {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]