Bobby Wang created SPARK-51320:
----------------------------------

             Summary: Failed to run spark ml on connect with 
pyspark-connect==4.0.0.dev2 installation
                 Key: SPARK-51320
                 URL: https://issues.apache.org/jira/browse/SPARK-51320
             Project: Spark
          Issue Type: Bug
          Components: Connect, ML, PySpark
    Affects Versions: 4.0.0, 4.1
            Reporter: Bobby Wang


After deploying spark connect server by

 
{code:java}
$SPARK_HOME/sbin/start-connect-server.sh \
  --master local[*] \
  --jars $SPARK_HOME/jars/spark-connect_2.13-4.1.0-SNAPSHOT.jar{code}
 

{color:#172b4d}I just installed pyspark-connect package instead of full spark 
by{color}

 
{code:java}
pip install pyspark-connect==4.0.0.dev2{code}
 

 

Then I ran the below code

 

 
{code:java}
from pyspark.ml.classification import (LogisticRegression,
                                       LogisticRegressionModel)
from pyspark.ml.linalg import Vectors
from pyspark.sql import SparkSession
spark = (SparkSession.builder.remote("sc://localhost")
         .getOrCreate())
df = spark.createDataFrame([
        (Vectors.dense([1.0, 2.0]), 1),
        (Vectors.dense([2.0, -1.0]), 1),
        (Vectors.dense([-3.0, -2.0]), 0),
        (Vectors.dense([-1.0, -2.0]), 0),
        ], schema=['features', 'label'])
lr = LogisticRegression(maxIter=19, tol=0.0023)
model = lr.fit(df)
print(f"======== model.intercept: {model.intercept}")
print(f"======== model.coefficients: {model.coefficients}")
model.transform(df).show()
{code}
 

It threw below errors

 

Traceback (most recent call last):
  File "run-demo.py", line 16, in <module>
    lr = LogisticRegression(maxIter=19, tol=0.0023)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/xxx/anaconda3/envs/pyspark-connect/lib/python3.11/site-packages/pyspark/__init__.py",
 line 115, in wrapper
    return func(self, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^
  File 
"/home/xxx/anaconda3/envs/pyspark-connect/lib/python3.11/site-packages/pyspark/ml/classification.py",
 line 1317, in __init__
    self._java_obj = self._new_java_obj(
                     ^^^^^^^^^^^^^^^^^^^
  File 
"/home/xxx/anaconda3/envs/pyspark-connect/lib/python3.11/site-packages/pyspark/ml/wrapper.py",
 line 81, in _new_java_obj
    from pyspark.core.context import SparkContext
ModuleNotFoundError: No module named 'pyspark.core'
Exception ignored in: <function JavaWrapper.__del__ at 0x7d32d1fcf2e0>
Traceback (most recent call last):
  File 
"/home/xxx/anaconda3/envs/pyspark-connect/lib/python3.11/site-packages/pyspark/ml/wrapper.py",
 line 51, in __del__
    from pyspark.core.context import SparkContext
ModuleNotFoundError: No module named 'pyspark.core'

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to