Hi all,

I am currently using a Spark standalone cluster, which is functioning as
expected. Users are able to connect to the cluster and submit jobs without
any issues.



I am also testing the Spark Connect capability, which will enable external
clients to submit jobs to the cluster. To start the Spark Connect server, I
am running the command `/sbin/start-connect-server.sh --packages
org.apache.spark:spark-connect_2.12:3.5.1` on the Spark master node. The
command executes without any errors, suggesting that the Spark Connect
server is running successfully.



However, I am unable to access the Spark UI at to verify the Spark Connect
server's status. Can someone please provide guidance on how to confirm that
Spark Connect is functioning properly?



Additionally, when attempting to run a Spark code snippet in Jupyter
Notebook to test Spark Connect, I encounter the following error. If someone
is familiar with this issue or could provide assistance in resolving it, I
would greatly appreciate it.



import pyspark

import pandas

import pyarrow

import grpc_status

import grpc

import torch

from pyspark.sql import SparkSession

import os

os.environ["SPARK_HOME"] = "/path/to/my/standalone/pyspark/cluster"

spark = SparkSession.builder.remote("sc://<stand alone cluster master
IP>").getOrCreate()

#spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate()



Error :
/pyspark/sql/connect/session.py:185: UserWarning: <_InactiveRpcError of RPC
that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses; last error: UNKNOWN:
ipv4::15002: Failed to connect to remote host: Connection refused"
debug_error_string = "UNKNOWN:Error received from peer
{grpc_message:"failed to connect to all addresses; last error: UNKNOWN:
ipv4::15002: Failed to connect to remote host: Connection refused",
grpc_status:14, created_time:"2024-07-19T14:48:49.12739279+08:00"}"

warnings.warn(str(e))





Thanks,
Elango

Reply via email to