Ben Burnett created SPARK-51343:
-----------------------------------

             Summary: RelationPlugin scala signature does not match bytecode
                 Key: SPARK-51343
                 URL: https://issues.apache.org/jira/browse/SPARK-51343
             Project: Spark
          Issue Type: Bug
          Components: Connect, Connect Contrib
    Affects Versions: 3.5.4
            Reporter: Ben Burnett
             Fix For: 3.5.4


I'm writing a dataframe plugin for spark connect to support functionality that 
previously used py4j and it seems like the RelationPlugin class has mismatched 
scala and java signatures in the binary. It seems like 
`com.google.protobuf.Any` being shaded in the bytecode to 
`org.sparkproject.connect.protobuf.Any` but remains as 
`com.google.protobuf.Any` in the scala signature annotation. 

Here's the jd-gui bytecode reassembled

{color:#7f0055}public{color}{color:#000000} 
{color}{color:#7f0055}interface{color}{color:#000000} 
{color}{color:#000000}RelationPlugin{color}{color:#000000} 
{color}{color:#000000}{{color}
{color:#000000}    
{color}{color:#000000}Option{color}{color:#000000}<{color}{color:#000000}LogicalPlan{color}{color:#000000}>{color}{color:#000000}
 
{color}{color:#000000}transform{color}{color:#000000}({color}{color:#000000}Any{color}{color:#000000}
 {color}{color:#000000}paramAny{color}{color:#000000},{color}{color:#000000} 
{color}{color:#000000}SparkConnectPlanner{color}{color:#000000} 
{color}{color:#000000}paramSparkConnectPlanner{color}{color:#000000}){color}{color:#000000};{color}
{color:#000000}}

Here's the bytecode
public abstract 
transform(Lorg/sparkproject/connect/protobuf/Any;Lorg/apache/spark/sql/connect/planner/SparkConnectPlanner;)Lscala/Option;


Here's the intellij reassembled class (Im guessing this is using the scala 
signature to inform reassembly but not sure)
trait RelationPlugin {
  def transform(relation: com.google.protobuf.Any, planner: 
SparkConnectPlanner): Option[LogicalPlan]
}
I'm a bit new to scala signatures but when I run ScalaSigParser on it, I see 
lots of references to com.google.protobuf.Any
40:    TypeRefType(ThisType(com.google.protobuf),com.google.protobuf.Any,List())
41:    ThisType(com.google.protobuf)
42:    com.google.protobuf

Basically this is presenting a challenge because at compile time, it seems like 
my class is being validated against the scala signature (which uses com.google) 
at compile time but at runtime its using the bytecode (which uses 
org.sparkproject.connect) so the interface is actually changing. A potential 
solution is to shade protobuf to the org.sparkproject.connect like [another 
maintainer did 
here|[https://github.com/SemyonSinchenko/tsumugi-spark/blob/ac95948d3be24508aa236927ddc379fd36708d14/tsumugi-server/pom.xml#L247]]
 but that seems error prone and I don't want to include the protobuf jar in my 
final output. 

I understand not fixing this since it seems like the interface is being changed 
in spark 4 but Im not sure how to handle this at runtime. Is the solution just 
to shade it myself so that it passes compile checks but then reflects the 
correct runtime signature like Semyon did?

Apologies if I'm creating a duplicate issue, I looked and couldn't find 
anything referencing this in the existing issues. This is my first issue so 
apologies if I've linked or set this up incorrectly{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to