Bobby Wang created SPARK-51537: ---------------------------------- Summary: Failed to run third-party Spark ML library on Spark Connect Key: SPARK-51537 URL: https://issues.apache.org/jira/browse/SPARK-51537 Project: Spark Issue Type: Bug Components: Connect, ML Affects Versions: 4.0.0, 4.1 Reporter: Bobby Wang
I've encountered an issue where the third-party Spark ML library may not run on Spark Connect. This problem occurs when specifying the third-party ML jar using the *--jars* configuration while creating a connect server based on a Spark standalone cluster. The exception thrown is a ClassCastException: _Caused by: java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD_ _at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2096)_ However, if I place the ML jar into the *$SPARK_HOME/jars* directory and restart both the Spark standalone cluster and the Spark Connect server, it runs without any exceptions. Alternatively, adding *spark.addArtifacts("target/com.example.ml-1.0-SNAPSHOT.jar")* directly in the python code also resolves the issue. I have made a minimum project which can repro this issue, more details could be found at [https://github.com/wbo4958/ConnectMLIssue] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org