Could you file an official Spark JIRA issue with those reproducers (your
CLI command, YAML files, error messages)?

In addition, it would be helpful if you can describe how to set up your K8s
clusters to make sure that it's not a K8s cluster issue.

After having a JIRA, we can continue on that JIRA issue.

BTW, Spark Driver pod is a normal pod. So, you had better start with a
successful POD YAML file first before using `spark-submit`.

In other words, please attach a successful sample `busybox`-image POD YAML
file and (which passes all validations from on your K8s cluster).

Thanks,
Dongjoon.


On Fri, Jan 3, 2025 at 12:35 PM jilani shaik <jilani2...@gmail.com> wrote:

> Thanks, Dongjoon for the details.
>
> almost similar yaml file I have added for the template reference file and
> I am getting below error
>
> Exception in thread "main"
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing:
> POST at: https://k8sURL/api/v1/namespaces/namespace1/pods
> <https://k8surl/api/v1/namespaces/namespace1/pods>. Message: Forbidden!
> User user123 doesn't have permission. admission webhook "
> validation.gatekeeper.sh" denied the request: [must-have-probes]
> Container <spark-kubernetes-driver> in your <Pod>
> <spark-pi-bd7ea2942dd485c3-driver> has no <livenessProbe>
>
> [must-have-probes] Container <spark-kubernetes-driver> in your <Pod>
> <spark-pi-bd7ea2942dd485c3-driver> has no <readinessProbe>
>
> [psp-pods-allowed-user-ranges] Container spark-kubernetes-driver is
> attempting to run without a required securityContext/runAsUser
>
> [restricted-capabilities] container <spark-kubernetes-driver> is not
> dropping all required capabilities. Container must drop all of ["KILL",
> "MKNOD", "SYS_CHROOT"] or "ALL"
>
> [k8s-emptydir-size] emptyDir volume <spark-local-dir-1> must have a size
> limit.
>
>        at
> io.fabric8.kubernetes.client.KubernetesClientException.copyAsCause(KubernetesClientException.java:238)
>
>        at
> io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:518)
>
>        at
> io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:535)
>
>        at
> io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleCreate(OperationSupport.java:340)
>
>        at
> io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:703)
>
>        at
> io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:92)
>
>        at
> io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:42)
>
>        at
> io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1108)
>
>        at
> io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:92)
>
>        at
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153)
>
>        at
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$6(KubernetesClientApplication.scala:256)
>
>        at
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$6$adapted(KubernetesClientApplication.scala:250)
>
>        at
> org.apache.spark.util.SparkErrorUtils.tryWithResource(SparkErrorUtils.scala:48)
>
>        at
> org.apache.spark.util.SparkErrorUtils.tryWithResource$(SparkErrorUtils.scala:46)
>
>        at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:94)
>
>        at
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:250)
>
>        at
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:223)
>
>        at org.apache.spark.deploy.SparkSubmit.org
> <http://org.apache.spark.deploy.sparksubmit.org/>
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1029)
>
>        at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
>
>        at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
>
>        at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
>
>        at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
>
>        at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
>
>        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
>
>
>
> my spark-submit command like this along with the pod template file
>
>
>
> spark-submit --verbose --master k8s://https://k8surl --deploy-mode
> cluster --name spark-pi --properties-file location1/spark-defaults.conf
> --num-executors 5 --conf
> spark.kubernetes.authenticate.driver.serviceAccountName=user123 --conf
> spark.kubernetes.namespace=namespace1 --conf
> spark.kubernetes.authenticate.caCertFile=location1/ca.crt --conf
> spark.kubernetes.authenticate.oauthTokenFile=location1/token1 --conf
> spark.kubernetes.authenticate.submission.caCertFile=location1/ca.crt --conf
> spark.kubernetes.authenticate.submission.oauthTokenFile=location1/token1
> --conf spark.kubernetes.file.upload.path=/tmp/ --conf
> spark.kubernetes.driver.podTemplateFile=location1/k8s/spark/driver.yaml
> --conf
> spark.kubernetes.executor.podTemplateFile=location1/k8s/spark/driver.yaml
> --class org.apache.spark.examples.SparkPi
> spark-3.5.3-bin-hadoop3/examples/jars/spark-examples_2.12-3.5.3.jar 100
>
>
>
> my template yaml file has the same as provided in Apache Spark github url
> with below additional details
>
>
>
>
>
> livenessProbe:
>   failureThreshold: 3
>   exec:
>     command:
>       - touch
>       - /tmp/healthy
>   initialDelaySeconds: 60
>   periodSeconds: 10
>   successThreshold: 1
>   timeoutSeconds: 1
> readinessProbe:
>   failureThreshold: 3
>   exec:
>     command:
>       - touch
>       - /tmp/healthy
>   initialDelaySeconds: 60
>   periodSeconds: 10
>   successThreshold: 1
>   timeoutSeconds: 1
>
>
>
>
>
> and restricted security context capabilities to pods via yaml file
> including run as user.
>
>
> securityContext:(under container level yaml entry)
>
>   capabilities:
>
>     drop:
>
>        - MKNOD
>
>        - KILL
>
>        - SYS_CHROOT
>
>
> securityContext:(same as container level yaml entry)
>
>   fsGroup: 2000
>
>   runAsGroup: 101
>
>   runAsUser: 1001
>
>
> Thanks,
>
> Jilani
>
>
>
> On Fri, Jan 3, 2025 at 12:41 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
> wrote:
>
>> Could you elaborate what you mean by `not working`?
>>
>> > but it's not working.
>>
>> For the following question, Spark expects a normal Pod YAML file.
>> You may want to take a look at the Apache Spark GitHub repository.
>>
>> > I do not have a  sample template file
>>
>> For example, the following files are used during K8s integration tests.
>>
>>
>> https://github.com/apache/spark/tree/master/resource-managers/kubernetes/integration-tests/src/test/resources
>>
>> 1. driver-schedule-template.yml
>> 2. driver-template.yml
>> 3. executor-template.yml
>>
>> Dongjoon.
>>
>> On Thu, Jan 2, 2025 at 12:07 PM jilani shaik <jilani2...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am trying to run Spark on the Kubernetes cluster, but that cluster has
>>> certain validation to deploy any pod that is not allowing me to run my
>>> Spark submit.
>>>
>>> for example, I need to add liveness, readiness probes and certain
>>> security capability restrictions, which we usually do for all outer pods
>>> via yaml file.
>>>
>>> not sure how to get that in Spark submit k8s. I tried the driver and
>>> executor template file, but it's not working. at the same time, I do not
>>> have a  sample template file from the documentation except below lines
>>>
>>> --conf spark.kubernetes.driver.podTemplateFile=s3a://bucket/driver.yml
>>> --conf spark.kubernetes.executor.podTemplateFile=s3a://bucket/executor.yml
>>>
>>>
>>> Can some one provide directions how to proceed further.
>>>
>>> Thanks,
>>> Jilani
>>>
>>

Reply via email to