[
https://issues.apache.org/jira/browse/SPARK-51343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17931227#comment-17931227
]
Ben Burnett commented on SPARK-51343:
-------------------------------------
Seems like its related to this:
https://issues.apache.org/jira/browse/SPARK-47712
> RelationPlugin scala signature does not match bytecode
> ------------------------------------------------------
>
> Key: SPARK-51343
> URL: https://issues.apache.org/jira/browse/SPARK-51343
> Project: Spark
> Issue Type: Bug
> Components: Connect, Connect Contrib
> Affects Versions: 3.5.4
> Reporter: Ben Burnett
> Priority: Minor
>
> I'm writing a dataframe plugin for spark connect to support functionality
> that previously used py4j and it seems like the RelationPlugin class has
> mismatched scala and java signatures in the binary. It seems like
> `com.google.protobuf.Any` being shaded in the bytecode to
> `org.sparkproject.connect.protobuf.Any` but remains as
> `com.google.protobuf.Any` in the scala signature annotation.
> Here's the jd-gui bytecode reassembled
> public interface RelationPlugin {
> Option<LogicalPlan> transform(Any paramAny, SparkConnectPlanner
> paramSparkConnectPlanner);
> }
> Here's the bytecode
> public abstract
> transform(Lorg/sparkproject/connect/protobuf/Any;Lorg/apache/spark/sql/connect/planner/SparkConnectPlanner;)Lscala/Option;
> Here's the intellij reassembled class (Im guessing this is using the scala
> signature to inform reassembly but not sure)
> trait RelationPlugin {
> def transform(relation: com.google.protobuf.Any, planner:
> SparkConnectPlanner): Option[LogicalPlan]
> }
> I'm a bit new to scala signatures but when I run ScalaSigParser on it, I see
> lots of references to com.google.protobuf.Any
> 40:
> TypeRefType(ThisType(com.google.protobuf),com.google.protobuf.Any,List())
> 41: ThisType(com.google.protobuf)
> 42: com.google.protobuf
> Basically this is presenting a challenge because it seems like my class is
> being validated against the scala signature (which uses com.google) at
> compile time but at runtime its using the bytecode (which uses
> org.sparkproject.connect) so the interface is actually changing. A potential
> solution is to shade protobuf to the org.sparkproject.connect like [another
> maintainer did
> here|https://github.com/SemyonSinchenko/tsumugi-spark/blob/ac95948d3be24508aa236927ddc379fd36708d14/tsumugi-server/pom.xml#L247]]
> but that seems error prone and I don't want to include the protobuf jar in
> my final output.
> I understand not fixing this since it seems like the interface is being
> changed in spark 4 but Im not sure how to handle this at runtime. Is the
> solution just to shade it myself so that it passes compile checks but then
> reflects the correct runtime signature like Semyon did?
> Apologies if I'm creating a duplicate issue, I looked and couldn't find
> anything referencing this in the existing issues. This is my first issue so
> apologies if I've linked or set this up incorrectly
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]