Ben Burnett created SPARK-51343:
-----------------------------------
Summary: RelationPlugin scala signature does not match bytecode
Key: SPARK-51343
URL: https://issues.apache.org/jira/browse/SPARK-51343
Project: Spark
Issue Type: Bug
Components: Connect, Connect Contrib
Affects Versions: 3.5.4
Reporter: Ben Burnett
Fix For: 3.5.4
I'm writing a dataframe plugin for spark connect to support functionality that
previously used py4j and it seems like the RelationPlugin class has mismatched
scala and java signatures in the binary. It seems like
`com.google.protobuf.Any` being shaded in the bytecode to
`org.sparkproject.connect.protobuf.Any` but remains as
`com.google.protobuf.Any` in the scala signature annotation.
Here's the jd-gui bytecode reassembled
{color:#7f0055}public{color}{color:#000000}
{color}{color:#7f0055}interface{color}{color:#000000}
{color}{color:#000000}RelationPlugin{color}{color:#000000}
{color}{color:#000000}{{color}
{color:#000000}
{color}{color:#000000}Option{color}{color:#000000}<{color}{color:#000000}LogicalPlan{color}{color:#000000}>{color}{color:#000000}
{color}{color:#000000}transform{color}{color:#000000}({color}{color:#000000}Any{color}{color:#000000}
{color}{color:#000000}paramAny{color}{color:#000000},{color}{color:#000000}
{color}{color:#000000}SparkConnectPlanner{color}{color:#000000}
{color}{color:#000000}paramSparkConnectPlanner{color}{color:#000000}){color}{color:#000000};{color}
{color:#000000}}
Here's the bytecode
public abstract
transform(Lorg/sparkproject/connect/protobuf/Any;Lorg/apache/spark/sql/connect/planner/SparkConnectPlanner;)Lscala/Option;
Here's the intellij reassembled class (Im guessing this is using the scala
signature to inform reassembly but not sure)
trait RelationPlugin {
def transform(relation: com.google.protobuf.Any, planner:
SparkConnectPlanner): Option[LogicalPlan]
}
I'm a bit new to scala signatures but when I run ScalaSigParser on it, I see
lots of references to com.google.protobuf.Any
40: TypeRefType(ThisType(com.google.protobuf),com.google.protobuf.Any,List())
41: ThisType(com.google.protobuf)
42: com.google.protobuf
Basically this is presenting a challenge because at compile time, it seems like
my class is being validated against the scala signature (which uses com.google)
at compile time but at runtime its using the bytecode (which uses
org.sparkproject.connect) so the interface is actually changing. A potential
solution is to shade protobuf to the org.sparkproject.connect like [another
maintainer did
here|[https://github.com/SemyonSinchenko/tsumugi-spark/blob/ac95948d3be24508aa236927ddc379fd36708d14/tsumugi-server/pom.xml#L247]]
but that seems error prone and I don't want to include the protobuf jar in my
final output.
I understand not fixing this since it seems like the interface is being changed
in spark 4 but Im not sure how to handle this at runtime. Is the solution just
to shade it myself so that it passes compile checks but then reflects the
correct runtime signature like Semyon did?
Apologies if I'm creating a duplicate issue, I looked and couldn't find
anything referencing this in the existing issues. This is my first issue so
apologies if I've linked or set this up incorrectly{color}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]