Bobby Wang created SPARK-51320: ---------------------------------- Summary: Failed to run spark ml on connect with pyspark-connect==4.0.0.dev2 installation Key: SPARK-51320 URL: https://issues.apache.org/jira/browse/SPARK-51320 Project: Spark Issue Type: Bug Components: Connect, ML, PySpark Affects Versions: 4.0.0, 4.1 Reporter: Bobby Wang
After deploying spark connect server by {code:java} $SPARK_HOME/sbin/start-connect-server.sh \ --master local[*] \ --jars $SPARK_HOME/jars/spark-connect_2.13-4.1.0-SNAPSHOT.jar{code} {color:#172b4d}I just installed pyspark-connect package instead of full spark by{color} {code:java} pip install pyspark-connect==4.0.0.dev2{code} Then I ran the below code {code:java} from pyspark.ml.classification import (LogisticRegression, LogisticRegressionModel) from pyspark.ml.linalg import Vectors from pyspark.sql import SparkSession spark = (SparkSession.builder.remote("sc://localhost") .getOrCreate()) df = spark.createDataFrame([ (Vectors.dense([1.0, 2.0]), 1), (Vectors.dense([2.0, -1.0]), 1), (Vectors.dense([-3.0, -2.0]), 0), (Vectors.dense([-1.0, -2.0]), 0), ], schema=['features', 'label']) lr = LogisticRegression(maxIter=19, tol=0.0023) model = lr.fit(df) print(f"======== model.intercept: {model.intercept}") print(f"======== model.coefficients: {model.coefficients}") model.transform(df).show() {code} It threw below errors Traceback (most recent call last): File "run-demo.py", line 16, in <module> lr = LogisticRegression(maxIter=19, tol=0.0023) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxx/anaconda3/envs/pyspark-connect/lib/python3.11/site-packages/pyspark/__init__.py", line 115, in wrapper return func(self, **kwargs) ^^^^^^^^^^^^^^^^^^^^ File "/home/xxx/anaconda3/envs/pyspark-connect/lib/python3.11/site-packages/pyspark/ml/classification.py", line 1317, in __init__ self._java_obj = self._new_java_obj( ^^^^^^^^^^^^^^^^^^^ File "/home/xxx/anaconda3/envs/pyspark-connect/lib/python3.11/site-packages/pyspark/ml/wrapper.py", line 81, in _new_java_obj from pyspark.core.context import SparkContext ModuleNotFoundError: No module named 'pyspark.core' Exception ignored in: <function JavaWrapper.__del__ at 0x7d32d1fcf2e0> Traceback (most recent call last): File "/home/xxx/anaconda3/envs/pyspark-connect/lib/python3.11/site-packages/pyspark/ml/wrapper.py", line 51, in __del__ from pyspark.core.context import SparkContext ModuleNotFoundError: No module named 'pyspark.core' -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org