[
https://issues.apache.org/jira/browse/SPARK-51343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ben Burnett updated SPARK-51343:
--------------------------------
Description:
I'm writing a dataframe plugin for spark connect to support functionality that
previously used py4j and it seems like the RelationPlugin class has mismatched
scala and java signatures in the binary. It seems like
`com.google.protobuf.Any` being shaded in the bytecode to
`org.sparkproject.connect.protobuf.Any` but remains as
`com.google.protobuf.Any` in the scala signature annotation.
Here's the jd-gui bytecode reassembled
public interface RelationPlugin {
Option<LogicalPlan> transform(Any paramAny, SparkConnectPlanner
paramSparkConnectPlanner);
}
Here's the bytecode
public abstract
transform(Lorg/sparkproject/connect/protobuf/Any;Lorg/apache/spark/sql/connect/planner/SparkConnectPlanner;)Lscala/Option;
Here's the intellij reassembled class (Im guessing this is using the scala
signature to inform reassembly but not sure)
trait RelationPlugin {
def transform(relation: com.google.protobuf.Any, planner:
SparkConnectPlanner): Option[LogicalPlan]
}
I'm a bit new to scala signatures but when I run ScalaSigParser on it, I see
lots of references to com.google.protobuf.Any
40: TypeRefType(ThisType(com.google.protobuf),com.google.protobuf.Any,List())
41: ThisType(com.google.protobuf)
42: com.google.protobuf
Basically this is presenting a challenge because at compile time, it seems like
my class is being validated against the scala signature (which uses com.google)
at compile time but at runtime its using the bytecode (which uses
org.sparkproject.connect) so the interface is actually changing. A potential
solution is to shade protobuf to the org.sparkproject.connect like [another
maintainer did here|#L247]] but that seems error prone and I don't want to
include the protobuf jar in my final output.
I understand not fixing this since it seems like the interface is being changed
in spark 4 but Im not sure how to handle this at runtime. Is the solution just
to shade it myself so that it passes compile checks but then reflects the
correct runtime signature like Semyon did?
Apologies if I'm creating a duplicate issue, I looked and couldn't find
anything referencing this in the existing issues. This is my first issue so
apologies if I've linked or set this up incorrectly
was:
I'm writing a dataframe plugin for spark connect to support functionality that
previously used py4j and it seems like the RelationPlugin class has mismatched
scala and java signatures in the binary. It seems like
`com.google.protobuf.Any` being shaded in the bytecode to
`org.sparkproject.connect.protobuf.Any` but remains as
`com.google.protobuf.Any` in the scala signature annotation.
Here's the jd-gui bytecode reassembled
{color:#7f0055}public{color}{color:#000000}
{color}{color:#7f0055}interface{color}{color:#000000}
{color}{color:#000000}RelationPlugin{color}{color:#000000}
{color}{color:#000000}{{color}
{color:#000000}
{color}{color:#000000}Option{color}{color:#000000}<{color}{color:#000000}LogicalPlan{color}{color:#000000}>{color}{color:#000000}
{color}{color:#000000}transform{color}{color:#000000}({color}{color:#000000}Any{color}{color:#000000}
{color}{color:#000000}paramAny{color}{color:#000000},{color}{color:#000000}
{color}{color:#000000}SparkConnectPlanner{color}{color:#000000}
{color}{color:#000000}paramSparkConnectPlanner{color}{color:#000000}){color}{color:#000000};{color}
{color:#000000}}
Here's the bytecode
public abstract
transform(Lorg/sparkproject/connect/protobuf/Any;Lorg/apache/spark/sql/connect/planner/SparkConnectPlanner;)Lscala/Option;
Here's the intellij reassembled class (Im guessing this is using the scala
signature to inform reassembly but not sure)
trait RelationPlugin {
def transform(relation: com.google.protobuf.Any, planner:
SparkConnectPlanner): Option[LogicalPlan]
}
I'm a bit new to scala signatures but when I run ScalaSigParser on it, I see
lots of references to com.google.protobuf.Any
40: TypeRefType(ThisType(com.google.protobuf),com.google.protobuf.Any,List())
41: ThisType(com.google.protobuf)
42: com.google.protobuf
Basically this is presenting a challenge because at compile time, it seems like
my class is being validated against the scala signature (which uses com.google)
at compile time but at runtime its using the bytecode (which uses
org.sparkproject.connect) so the interface is actually changing. A potential
solution is to shade protobuf to the org.sparkproject.connect like [another
maintainer did
here|[https://github.com/SemyonSinchenko/tsumugi-spark/blob/ac95948d3be24508aa236927ddc379fd36708d14/tsumugi-server/pom.xml#L247]]
but that seems error prone and I don't want to include the protobuf jar in my
final output.
I understand not fixing this since it seems like the interface is being changed
in spark 4 but Im not sure how to handle this at runtime. Is the solution just
to shade it myself so that it passes compile checks but then reflects the
correct runtime signature like Semyon did?
Apologies if I'm creating a duplicate issue, I looked and couldn't find
anything referencing this in the existing issues. This is my first issue so
apologies if I've linked or set this up incorrectly{color}
> RelationPlugin scala signature does not match bytecode
> ------------------------------------------------------
>
> Key: SPARK-51343
> URL: https://issues.apache.org/jira/browse/SPARK-51343
> Project: Spark
> Issue Type: Bug
> Components: Connect, Connect Contrib
> Affects Versions: 3.5.4
> Reporter: Ben Burnett
> Priority: Minor
>
> I'm writing a dataframe plugin for spark connect to support functionality
> that previously used py4j and it seems like the RelationPlugin class has
> mismatched scala and java signatures in the binary. It seems like
> `com.google.protobuf.Any` being shaded in the bytecode to
> `org.sparkproject.connect.protobuf.Any` but remains as
> `com.google.protobuf.Any` in the scala signature annotation.
> Here's the jd-gui bytecode reassembled
> public interface RelationPlugin {
> Option<LogicalPlan> transform(Any paramAny, SparkConnectPlanner
> paramSparkConnectPlanner);
> }
> Here's the bytecode
> public abstract
> transform(Lorg/sparkproject/connect/protobuf/Any;Lorg/apache/spark/sql/connect/planner/SparkConnectPlanner;)Lscala/Option;
> Here's the intellij reassembled class (Im guessing this is using the scala
> signature to inform reassembly but not sure)
> trait RelationPlugin {
> def transform(relation: com.google.protobuf.Any, planner:
> SparkConnectPlanner): Option[LogicalPlan]
> }
> I'm a bit new to scala signatures but when I run ScalaSigParser on it, I see
> lots of references to com.google.protobuf.Any
> 40:
> TypeRefType(ThisType(com.google.protobuf),com.google.protobuf.Any,List())
> 41: ThisType(com.google.protobuf)
> 42: com.google.protobuf
> Basically this is presenting a challenge because at compile time, it seems
> like my class is being validated against the scala signature (which uses
> com.google) at compile time but at runtime its using the bytecode (which uses
> org.sparkproject.connect) so the interface is actually changing. A potential
> solution is to shade protobuf to the org.sparkproject.connect like [another
> maintainer did here|#L247]] but that seems error prone and I don't want to
> include the protobuf jar in my final output.
> I understand not fixing this since it seems like the interface is being
> changed in spark 4 but Im not sure how to handle this at runtime. Is the
> solution just to shade it myself so that it passes compile checks but then
> reflects the correct runtime signature like Semyon did?
> Apologies if I'm creating a duplicate issue, I looked and couldn't find
> anything referencing this in the existing issues. This is my first issue so
> apologies if I've linked or set this up incorrectly
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]