ok great is sorted out Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> On Fri, 24 Jan 2025 at 15:27, Andrew Petersen <aapet...@ncsu.edu> wrote: > Thank you Mich > > It seems like the reason my script was not working was because of the > cleanup function, which goes something like the skeleton I pasted below. > The purpose of this function is to stop all of the spark processes if there > is an error or the job ends. If I comment out "trap cleanup EXIT", it > works perfectly. Do you have any idea why the last line (with > start-connect-server.sh) seems to be sending a trap signal? Other lines > like > > $SPARK_HOME/bin/spark-shell --master > spark://$SPARK_MASTER:$SPARK_MASTER_PORT $@ > > or > > $SPARK_HOME/bin/spark-submit --master spark://$SPARK_MASTER: > $SPARK_MASTER_PORT $@ > > work fine with the same script with the trap/cleanup function engaged. > > ----------- > trap cleanup EXIT > > #stop spark > > function cleanup() > > { > > ... > > do > > if [ "$HOST_STR" = "" ]; then > > ... > > blaunch -z $HOST_STR ./lsf-stop-spark.sh > > ... > > $SPARK_HOME/sbin/start-connect-server.sh --packages > org.apache.spark:spark-connect_2.12:3.5.4 --master spark://$SPARK_MASTER: > $SPARK_MASTER_PORT $@ > > > > On Thu, Jan 23, 2025 at 3:02 PM Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> Try this >> >> $SPARK_HOME/sbin/start-connect-server.sh \ >> --packages org.apache.spark:spark-connect_2.12:3.5.4 \ >> --master spark://<spark-master-hostname>:<spark-master-port> >> >> Note that the Spark Connect server might not have access to the >> environment variables you set in your shell session. It needs the complete >> hostname or IP address to locate the Spark master. >> >> HTH >> >> Mich Talebzadeh, >> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> >> >> On Thu, 23 Jan 2025 at 19:39, Andrew Petersen <aapet...@ncsu.edu> wrote: >> >>> OK, so would the Spark Connect server connect to the Spark Cluster like >>> this: >>> >>> $SPARK_HOME/sbin/start-connect-server.sh --packages >>> org.apache.spark:spark-connect_2.12:3.5.4 --master >>> spark://$SPARK_MASTER:$SPARK_MASTER_PORT >>> >>> ? >>> >>> >>> >>> On Thu, Jan 23, 2025 at 1:35 PM Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>> >>>> Well >>>> >>>> 1. Spark Connect workers don't directly connect to the Spark >>>> Connect server. >>>> 2. Client applications use the Spark Connect API to interact with >>>> the Spark cluster through the server. >>>> 3. I suggest focusing on developing client applications that >>>> leverage the Spark Connect API. >>>> >>>> The differences between Spark Standalone and Spark Connect: >>>> >>>> In Standalone mode, workers connect directly to the Spark master using >>>> the spark:// protocol and the master's hostname and port. >>>> Spark Connect uses a different architecture. Workers don't directly >>>> connect to the Spark Connect server. Instead, client applications connect >>>> to the Spark Connect server, which then interacts with the Spark cluster on >>>> their behalf. >>>> >>>> HTH >>>> >>>> Mich Talebzadeh, >>>> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR >>>> >>>> view my Linkedin profile >>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>> >>>> >>>> >>>> >>>> >>>> On Thu, 23 Jan 2025 at 18:05, Andrew Petersen <aapet...@ncsu.edu.invalid> >>>> wrote: >>>> >>>>> Hello Spark community >>>>> >>>>> I am trying to connect a worker to the connect server. >>>>> >>>>> Following the documentation, I am able to get the spark-connect server >>>>> to run in the simple one node way. >>>>> >>>>> >>>>> Am I correct to assume that the spark-connect server can work with >>>>> spark workers? If so, how do I connect a spark worker to a spark-connect >>>>> server? I have a standalone spark setup, and I am used to using scripts >>>>> that start worker daemons and connect them to the master. I tried >>>>> connecting a worker to a connect server similar to how I would do it with >>>>> a >>>>> master. >>>>> >>>>> >>>>> First I start the connector service: >>>>> >>>>> $SPARK_HOME/sbin/start-connect-server.sh --packages >>>>> org.apache.spark:spark-connect_2.12:3.5.4 >>>>> >>>>> >>>>> Then I try to connect a worker: >>>>> >>>>> $SPARK_HOME/sbin/spark-daemon.sh start >>>>> org.apache.spark.deploy.worker.Worker 1 spark://nxxcxx:15002 >>>>> >>>>> However, I get an error: >>>>> >>>>> 25/01/23 11:57:19 INFO Worker: Connecting to master cxxnxx:15002... >>>>> >>>>> 25/01/23 11:57:19 INFO TransportClientFactory: Successfully created >>>>> connection to cxxnxx/192.xxx.x.xxx:15002 after 38 ms (0 ms spent in >>>>> bootstraps) >>>>> >>>>> 25/01/23 11:57:20 WARN TransportChannelHandler: Exception in >>>>> connection from cxxxnxx/192.xxx.x.xx:15002 >>>>> >>>>> java.lang.IllegalArgumentException: Too large frame: 19808389169144 >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Andrew Petersen, PhD >>>>> Advanced Computing, Office of Information Technology >>>>> 2620 Hillsborough Street >>>>> datascience.oit.ncsu.edu >>>>> >>>> >>> >>> -- >>> Andrew Petersen, PhD >>> Advanced Computing, Office of Information Technology >>> 2620 Hillsborough Street >>> datascience.oit.ncsu.edu >>> >> > > -- > Andrew Petersen, PhD > Advanced Computing, Office of Information Technology > 2620 Hillsborough Street > datascience.oit.ncsu.edu >