[ https://issues.apache.org/jira/browse/SPARK-51320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930906#comment-17930906 ]
Bobby Wang commented on SPARK-51320: ------------------------------------ Close this task due to pyspark-connect==4.0.0.dev2 is quite old that doesn't include the latest spark connect ml features. I tried to install pyspark-client compiled locally, it worked very well. > Failed to run spark ml on connect with pyspark-connect==4.0.0.dev2 > installation > ------------------------------------------------------------------------------- > > Key: SPARK-51320 > URL: https://issues.apache.org/jira/browse/SPARK-51320 > Project: Spark > Issue Type: Bug > Components: Connect, ML, PySpark > Affects Versions: 4.0.0, 4.1 > Reporter: Bobby Wang > Priority: Major > > After deploying spark connect server by > > {code:java} > $SPARK_HOME/sbin/start-connect-server.sh \ > --master local[*] \ > --jars $SPARK_HOME/jars/spark-connect_2.13-4.1.0-SNAPSHOT.jar{code} > > {color:#172b4d}I just installed pyspark-connect package instead of full spark > by{color} > > {code:java} > pip install pyspark-connect==4.0.0.dev2{code} > > > Then I ran the below code > > > {code:java} > from pyspark.ml.classification import (LogisticRegression, > LogisticRegressionModel) > from pyspark.ml.linalg import Vectors > from pyspark.sql import SparkSession > spark = (SparkSession.builder.remote("sc://localhost") > .getOrCreate()) > df = spark.createDataFrame([ > (Vectors.dense([1.0, 2.0]), 1), > (Vectors.dense([2.0, -1.0]), 1), > (Vectors.dense([-3.0, -2.0]), 0), > (Vectors.dense([-1.0, -2.0]), 0), > ], schema=['features', 'label']) > lr = LogisticRegression(maxIter=19, tol=0.0023) > model = lr.fit(df) > print(f"======== model.intercept: {model.intercept}") > print(f"======== model.coefficients: {model.coefficients}") > model.transform(df).show() > {code} > > It threw below errors > > Traceback (most recent call last): > File "run-demo.py", line 16, in <module> > lr = LogisticRegression(maxIter=19, tol=0.0023) > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > File > "/home/xxx/anaconda3/envs/pyspark-connect/lib/python3.11/site-packages/pyspark/__init__.py", > line 115, in wrapper > return func(self, **kwargs) > ^^^^^^^^^^^^^^^^^^^^ > File > "/home/xxx/anaconda3/envs/pyspark-connect/lib/python3.11/site-packages/pyspark/ml/classification.py", > line 1317, in __init__ > self._java_obj = self._new_java_obj( > ^^^^^^^^^^^^^^^^^^^ > File > "/home/xxx/anaconda3/envs/pyspark-connect/lib/python3.11/site-packages/pyspark/ml/wrapper.py", > line 81, in _new_java_obj > from pyspark.core.context import SparkContext > ModuleNotFoundError: No module named 'pyspark.core' > Exception ignored in: <function JavaWrapper.__del__ at 0x7d32d1fcf2e0> > Traceback (most recent call last): > File > "/home/xxx/anaconda3/envs/pyspark-connect/lib/python3.11/site-packages/pyspark/ml/wrapper.py", > line 51, in __del__ > from pyspark.core.context import SparkContext > ModuleNotFoundError: No module named 'pyspark.core' > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org