Re: Running Spark Connect Server in Cluster Mode on Kubernetes

Nagatomi Yasukazu Tue, 05 Sep 2023 02:52:12 -0700

Dear Spark Community,

I've been exploring the capabilities of the Spark Connect Server and
encountered an issue when trying to launch it in a cluster deploy mode with
Kubernetes as the master.


While initiating the `start-connect-server.sh` script with the `--conf`
parameter for `spark.master` and `spark.submit.deployMode`, I was met with
an error message:

```
Exception in thread "main" org.apache.spark.SparkException: Cluster deploy
mode is not applicable to Spark Connect server.
```

This error message can be traced back to Spark's source code here:
https://github.com/apache/spark/blob/6c885a7cf57df328b03308cff2eed814bda156e4/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L307

Given my observations, I'm curious about the Spark Connect Server roadmap:

Is there a plan or current conversation to enable Kubernetes as a master in
Spark Connect Server's cluster deploy mode?

I have tried to gather information from existing JIRA tickets, but have not
been able to get a definitive answer:

https://issues.apache.org/jira/browse/SPARK-42730
https://issues.apache.org/jira/browse/SPARK-39375
https://issues.apache.org/jira/browse/SPARK-44117

Any thoughts, updates, or references to similar conversations or
initiatives would be greatly appreciated.

Thank you for your time and expertise!

Best regards,
Yasukazu

2023年9月5日(火) 12:09 Nagatomi Yasukazu <yassan0...@gmail.com>:

> Hello Mich,
> Thank you for your questions. Here are my responses:
>
> > 1. What investigation have you done to show that it is running in local
> mode?
>
> I have verified through the History Server's Environment tab that:
> - "spark.master" is set to local[*]
> - "spark.app.id" begins with local-xxx
> - "spark.submit.deployMode" is set to local
>
>
> > 2. who has configured this kubernetes cluster? Is it supplied by a cloud
> vendor?
>
> Our Kubernetes cluster was set up in an on-prem environment using RKE2(
> https://docs.rke2.io/ ).
>
>
> > 3. Confirm that you have configured Spark Connect Server correctly for
> cluster mode. Make sure you specify the cluster manager (e.g., Kubernetes)
> and other relevant Spark configurations in your Spark job submission.
>
> Based on the Spark Connect documentation I've read, there doesn't seem to
> be any specific settings for cluster mode related to the Spark Connect
> Server.
>
> Configuration - Spark 3.4.1 Documentation
> https://spark.apache.org/docs/3.4.1/configuration.html#spark-connect
>
> Quickstart: Spark Connect — PySpark 3.4.1 documentation
>
> https://spark.apache.org/docs/latest/api/python/getting_started/quickstart_connect.html
>
> Spark Connect Overview - Spark 3.4.1 Documentation
> https://spark.apache.org/docs/latest/spark-connect-overview.html
>
> The documentation only suggests running ./sbin/start-connect-server.sh
> --packages org.apache.spark:spark-connect_2.12:3.4.0, leaving me at a loss.
>
>
> > 4. Can you provide a full spark submit command
>
> Given the nature of Spark Connect, I don't use the spark-submit command.
> Instead, as per the documentation, I can execute workloads using only a
> Python script. For the Spark Connect Server, I have a Kubernetes manifest
> executing "/opt.spark/sbin/start-connect-server.sh --packages
> org.apache.spark:spark-connect_2.12:3.4.0".
>
>
> > 5. Make sure that the Python client script connecting to Spark Connect
> Server specifies the cluster mode explicitly, like using --master or
> --deploy-mode flags when creating a SparkSession.
>
> The Spark Connect Server operates as a Driver, so it isn't possible to
> specify the --master or --deploy-mode flags in the Python client script. If
> I try, I encounter a RuntimeError.
>
> like this:
> RuntimeError: Spark master cannot be configured with Spark Connect server;
> however, found URL for Spark Connect [sc://.../]
>
>
> > 6. Ensure that you have allocated the necessary resources (CPU, memory
> etc) to Spark Connect Server when running it on Kubernetes.
>
> Resources are ample, so that shouldn't be the problem.
>
>
> > 7. Review the environment variables and configurations you have set,
> including the SPARK_NO_DAEMONIZE=1 variable. Ensure that these variables
> are not conflicting with
>
> I'm unsure if SPARK_NO_DAEMONIZE=1 conflicts with cluster mode settings.
> But without it, the process goes to the background when executing
> start-connect-server.sh, causing the Pod to terminate prematurely.
>
>
> > 8. Are you using the correct spark client version that is fully
> compatible with your spark on the server?
>
> Yes, I have verified that without using Spark Connect (e.g., using Spark
> Operator), Spark applications run as expected.
>
> > 9. check the kubernetes error logs
>
> The Kubernetes logs don't show any errors, and jobs are running in local
> mode.
>
>
> > 10. Insufficient resources can lead to the application running in local
> mode
>
> I wasn't aware that insufficient resources could lead to local mode
> execution. Thank you for pointing it out.
>
>
> Best regards,
> Yasukazu
>
> 2023年9月5日(火) 1:28 Mich Talebzadeh <mich.talebza...@gmail.com>:
>
>>
>> personally I have not used this feature myself. However, some points
>>
>>
>>    1. What investigation have you done to show that it is running in
>>    local mode?
>>    2. who has configured this kubernetes cluster? Is it supplied by a
>>    cloud vendor?
>>    3. Confirm that you have configured Spark Connect Server correctly
>>    for cluster mode. Make sure you specify the cluster manager (e.g.,
>>    Kubernetes) and other relevant Spark configurations in your Spark job
>>    submission.
>>    4. Can you provide a full spark submit command
>>    5. Make sure that the Python client script connecting to Spark
>>    Connect Server specifies the cluster mode explicitly, like using
>>    --master or --deploy-mode flags when creating a SparkSession.
>>    6. Ensure that you have allocated the necessary resources (CPU,
>>    memory etc) to Spark Connect Server when running it on Kubernetes.
>>    7. Review the environment variables and configurations you have set,
>>    including the SPARK_NO_DAEMONIZE=1 variable. Ensure that these
>>    variables are not conflicting with cluster mode settings.
>>    8. Are you using the correct spark client version that is fully
>>    compatible with your spark on the server?
>>    9. check the kubernetes error logs
>>    10. Insufficient resources can lead to the application running in
>>    local mode
>>
>> HTH
>>
>> Mich Talebzadeh,
>> Distinguished Technologist, Solutions Architect & Engineer
>> London
>> United Kingdom
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Mon, 4 Sept 2023 at 04:57, Nagatomi Yasukazu <yassan0...@gmail.com>
>> wrote:
>>
>>> Hi Cley,
>>>
>>> Thank you for taking the time to respond to my query. Your insights on
>>> Spark cluster deployment are much appreciated.
>>>
>>> However, I'd like to clarify that my specific challenge is related to
>>> running the Spark Connect Server on Kubernetes in Cluster Mode. While I
>>> understand the general deployment strategies for Spark on Kubernetes, I am
>>> seeking guidance particularly on the Spark Connect Server aspect.
>>>
>>> cf. Spark Connect Overview - Spark 3.4.1 Documentation
>>>     https://spark.apache.org/docs/latest/spark-connect-overview.html
>>>
>>> To reiterate, when I connect from an external Python client and execute
>>> scripts, the server operates in Local Mode instead of the expected
>>> Kubernetes Cluster Mode (with master as k8s://... and deploy-mode set to
>>> cluster).
>>>
>>> If I've misunderstood your initial response and it was indeed related to
>>> Spark Connect, I sincerely apologize for the oversight. In that case, could
>>> you please expand a bit on the Spark Connect-specific aspects?
>>>
>>> Do you, or anyone else in the community, have experience with this
>>> specific setup or encountered a similar issue with Spark Connect Server on
>>> Kubernetes? Any targeted advice or guidance would be invaluable.
>>>
>>> Thank you again for your time and help.
>>>
>>> Best regards,
>>> Yasukazu
>>>
>>> 2023年9月4日(月) 0:23 Cleyson Barros <euroc...@gmail.com>:
>>>
>>>> Hi Nagatomi,
>>>> Use Apache imagers, then run your master node, then start your many
>>>> slavers. You can add a command line in the docker files to call for the
>>>> master using the docker container names in your service composition if you
>>>> wish to run 2 masters active and standby follow the instructions in the
>>>> Apache docs to do this configuration, the recipe is the same except when
>>>> you start the masters and how you expect the behaviour of your cluster.
>>>> I hope it helps.
>>>> Have a nice day :)
>>>> Cley
>>>>
>>>> Nagatomi Yasukazu <yassan0...@gmail.com> escreveu no dia sábado,
>>>> 2/09/2023 à(s) 15:37:
>>>>
>>>>> Hello Apache Spark community,
>>>>>
>>>>> I'm currently trying to run Spark Connect Server on Kubernetes in
>>>>> Cluster Mode and facing some challenges. Any guidance or hints would be
>>>>> greatly appreciated.
>>>>>
>>>>> ## Environment:
>>>>> Apache Spark version: 3.4.1
>>>>> Kubernetes version:  1.23
>>>>> Command executed:
>>>>>  /opt/spark/sbin/start-connect-server.sh \
>>>>>    --packages
>>>>> org.apache.spark:spark-connect_2.13:3.4.1,org.apache.iceberg:iceberg-spark-runtime-3.4_2.13:1.3.1...
>>>>> Note that I'm running it with the environment variable
>>>>> SPARK_NO_DAEMONIZE=1.
>>>>>
>>>>> ## Issue:
>>>>> When I connect from an external Python client and run scripts, it
>>>>> operates in Local Mode instead of the expected Cluster Mode.
>>>>>
>>>>> ## Expected Behavior:
>>>>> When connecting from a Python client to the Spark Connect Server, I
>>>>> expect it to run in Cluster Mode.
>>>>>
>>>>> If anyone has any insights, advice, or has faced a similar issue, I'd
>>>>> be grateful for your feedback.
>>>>> Thank you in advance.
>>>>>
>>>>>
>>>>>

Re: Running Spark Connect Server in Cluster Mode on Kubernetes

Reply via email to