Bobby Wang created SPARK-51537:
----------------------------------

             Summary: Failed to run third-party Spark ML library on Spark 
Connect 
                 Key: SPARK-51537
                 URL: https://issues.apache.org/jira/browse/SPARK-51537
             Project: Spark
          Issue Type: Bug
          Components: Connect, ML
    Affects Versions: 4.0.0, 4.1
            Reporter: Bobby Wang

I've encountered an issue where the third-party Spark ML library may not run 
on Spark Connect. This problem occurs when specifying the 
third-party ML jar using the *--jars* configuration while creating a connect 
server 
based on a Spark standalone cluster.
 
The exception thrown is a ClassCastException:
 
_Caused by: java.lang.ClassCastException: cannot assign instance of 
java.lang.invoke.SerializedLambda to field 
org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of 
org.apache.spark.rdd.MapPartitionsRDD_
        _at 
java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2096)_
        
However, if I place the ML jar into the *$SPARK_HOME/jars* directory and 
restart both the Spark standalone cluster and the Spark Connect server, it runs 
without any exceptions.
 
Alternatively, adding 
*spark.addArtifacts("target/com.example.ml-1.0-SNAPSHOT.jar")* directly in the 
python code also resolves the issue.
 
I have made a minimum project which can repro this issue, more details could be 
found at [https://github.com/wbo4958/ConnectMLIssue] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to