Re: [start-connect-server.sh] connecting with org.apache.spark.deploy.worker.Worker

Mich Talebzadeh Fri, 24 Jan 2025 07:34:50 -0800

ok great is sorted out

Mich Talebzadeh,
Architect | Data Science | Financial Crime | Forensic Analysis | GDPR


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>





On Fri, 24 Jan 2025 at 15:27, Andrew Petersen <aapet...@ncsu.edu> wrote:

> Thank you Mich
>
> It seems like the reason my script was not working was because of the
> cleanup function, which goes something like the skeleton I pasted below.
> The purpose of this function is to stop all of the spark processes if there
> is an error or the job ends. If I comment out "trap cleanup EXIT", it
> works perfectly. Do you have any idea why the last line (with
> start-connect-server.sh) seems to be sending a trap signal? Other lines
> like
>
> $SPARK_HOME/bin/spark-shell --master
> spark://$SPARK_MASTER:$SPARK_MASTER_PORT $@
>
> or
>
> $SPARK_HOME/bin/spark-submit --master spark://$SPARK_MASTER:
> $SPARK_MASTER_PORT $@
>
> work fine with the same script with the trap/cleanup function engaged.
>
> -----------
> trap cleanup EXIT
>
> #stop spark
>
> function cleanup()
>
> {
>
> ...
>
>     do
>
>         if [ "$HOST_STR" = "" ]; then
>
> ...
>
>             blaunch -z $HOST_STR ./lsf-stop-spark.sh
>
> ...
>
> $SPARK_HOME/sbin/start-connect-server.sh --packages
> org.apache.spark:spark-connect_2.12:3.5.4 --master spark://$SPARK_MASTER:
> $SPARK_MASTER_PORT $@
>
>
>
> On Thu, Jan 23, 2025 at 3:02 PM Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Try this
>>
>> $SPARK_HOME/sbin/start-connect-server.sh \
>>   --packages org.apache.spark:spark-connect_2.12:3.5.4 \
>>   --master spark://<spark-master-hostname>:<spark-master-port>
>>
>> Note that the Spark Connect server might not have access to the
>> environment variables you set in your shell session. It needs the complete
>> hostname or IP address to locate the Spark master.
>>
>> HTH
>>
>> Mich Talebzadeh,
>> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>>
>>
>> On Thu, 23 Jan 2025 at 19:39, Andrew Petersen <aapet...@ncsu.edu> wrote:
>>
>>> OK, so would the Spark Connect server connect to the Spark Cluster like
>>> this:
>>>
>>> $SPARK_HOME/sbin/start-connect-server.sh --packages
>>> org.apache.spark:spark-connect_2.12:3.5.4 --master
>>> spark://$SPARK_MASTER:$SPARK_MASTER_PORT
>>>
>>> ?
>>>
>>>
>>>
>>> On Thu, Jan 23, 2025 at 1:35 PM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Well
>>>>
>>>>    1. Spark Connect workers don't directly connect to the Spark
>>>>    Connect server.
>>>>    2. Client applications use the Spark Connect API to interact with
>>>>    the Spark cluster through the server.
>>>>    3. I suggest focusing on developing client applications that
>>>>    leverage the Spark Connect API.
>>>>
>>>> The differences between Spark Standalone and Spark Connect:
>>>>
>>>> In Standalone mode, workers connect directly to the Spark master using
>>>> the spark:// protocol and the master's hostname and port.
>>>> Spark Connect uses a different architecture. Workers don't directly
>>>> connect to the Spark Connect server. Instead, client applications connect
>>>> to the Spark Connect server, which then interacts with the Spark cluster on
>>>> their behalf.
>>>>
>>>> HTH
>>>>
>>>> Mich Talebzadeh,
>>>> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>>>>
>>>>    view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, 23 Jan 2025 at 18:05, Andrew Petersen <aapet...@ncsu.edu.invalid>
>>>> wrote:
>>>>
>>>>> Hello Spark community
>>>>>
>>>>> I am trying to connect a worker to the connect server.
>>>>>
>>>>> Following the documentation, I am able to get the spark-connect server
>>>>> to run in the simple one node way.
>>>>>
>>>>>
>>>>> Am I correct to assume that the spark-connect server can work with
>>>>> spark workers? If so, how do I connect a spark worker to a spark-connect
>>>>> server? I have a standalone spark setup, and I am used to using scripts
>>>>> that start worker daemons and connect them to the master. I tried
>>>>> connecting a worker to a connect server similar to how I would do it with 
>>>>> a
>>>>> master.
>>>>>
>>>>>
>>>>> First I start the connector service:
>>>>>
>>>>> $SPARK_HOME/sbin/start-connect-server.sh --packages
>>>>> org.apache.spark:spark-connect_2.12:3.5.4
>>>>>
>>>>>
>>>>> Then I try to connect a worker:
>>>>>
>>>>> $SPARK_HOME/sbin/spark-daemon.sh start
>>>>> org.apache.spark.deploy.worker.Worker 1 spark://nxxcxx:15002
>>>>>
>>>>> However, I get an error:
>>>>>
>>>>> 25/01/23 11:57:19 INFO Worker: Connecting to master cxxnxx:15002...
>>>>>
>>>>> 25/01/23 11:57:19 INFO TransportClientFactory: Successfully created
>>>>> connection to cxxnxx/192.xxx.x.xxx:15002 after 38 ms (0 ms spent in
>>>>> bootstraps)
>>>>>
>>>>> 25/01/23 11:57:20 WARN TransportChannelHandler: Exception in
>>>>> connection from cxxxnxx/192.xxx.x.xx:15002
>>>>>
>>>>> java.lang.IllegalArgumentException: Too large frame: 19808389169144
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Andrew Petersen, PhD
>>>>> Advanced Computing, Office of Information Technology
>>>>> 2620 Hillsborough Street
>>>>> datascience.oit.ncsu.edu
>>>>>
>>>>
>>>
>>> --
>>> Andrew Petersen, PhD
>>> Advanced Computing, Office of Information Technology
>>> 2620 Hillsborough Street
>>> datascience.oit.ncsu.edu
>>>
>>
>
> --
> Andrew Petersen, PhD
> Advanced Computing, Office of Information Technology
> 2620 Hillsborough Street
> datascience.oit.ncsu.edu
>

Re: [start-connect-server.sh] connecting with org.apache.spark.deploy.worker.Worker

Reply via email to