[jira] [Updated] (SPARK-51343) RelationPlugin scala signature does not match bytecode

Ben Burnett (Jira) Thu, 27 Feb 2025 09:01:22 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-51343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ben Burnett updated SPARK-51343:
--------------------------------
    Description: 
I'm writing a dataframe plugin for spark connect to support functionality that 
previously used py4j and it seems like the RelationPlugin class has mismatched 
scala and java signatures in the binary. It seems like 
`com.google.protobuf.Any` being shaded in the bytecode to 
`org.sparkproject.connect.protobuf.Any` but remains as 
`com.google.protobuf.Any` in the scala signature annotation.

Here's the jd-gui bytecode reassembled
public interface RelationPlugin {
  Option<LogicalPlan> transform(Any paramAny, SparkConnectPlanner 
paramSparkConnectPlanner);
}


Here's the bytecode
public abstract 
transform(Lorg/sparkproject/connect/protobuf/Any;Lorg/apache/spark/sql/connect/planner/SparkConnectPlanner;)Lscala/Option;

Here's the intellij reassembled class (Im guessing this is using the scala 
signature to inform reassembly but not sure)
trait RelationPlugin {   
    def transform(relation: com.google.protobuf.Any, planner: 
SparkConnectPlanner): Option[LogicalPlan] 
}

I'm a bit new to scala signatures but when I run ScalaSigParser on it, I see 
lots of references to com.google.protobuf.Any
40:    TypeRefType(ThisType(com.google.protobuf),com.google.protobuf.Any,List())
41:    ThisType(com.google.protobuf)
42:    com.google.protobuf

Basically this is presenting a challenge because at compile time, it seems like 
my class is being validated against the scala signature (which uses com.google) 
at compile time but at runtime its using the bytecode (which uses 
org.sparkproject.connect) so the interface is actually changing. A potential 
solution is to shade protobuf to the org.sparkproject.connect like [another 
maintainer did here|#L247]] but that seems error prone and I don't want to 
include the protobuf jar in my final output.

I understand not fixing this since it seems like the interface is being changed 
in spark 4 but Im not sure how to handle this at runtime. Is the solution just 
to shade it myself so that it passes compile checks but then reflects the 
correct runtime signature like Semyon did?

Apologies if I'm creating a duplicate issue, I looked and couldn't find 
anything referencing this in the existing issues. This is my first issue so 
apologies if I've linked or set this up incorrectly

  was:
I'm writing a dataframe plugin for spark connect to support functionality that 
previously used py4j and it seems like the RelationPlugin class has mismatched 
scala and java signatures in the binary. It seems like 
`com.google.protobuf.Any` being shaded in the bytecode to 
`org.sparkproject.connect.protobuf.Any` but remains as 
`com.google.protobuf.Any` in the scala signature annotation. 

Here's the jd-gui bytecode reassembled

{color:#7f0055}public{color}{color:#000000} 
{color}{color:#7f0055}interface{color}{color:#000000} 
{color}{color:#000000}RelationPlugin{color}{color:#000000} 
{color}{color:#000000}{{color}
{color:#000000}    
{color}{color:#000000}Option{color}{color:#000000}<{color}{color:#000000}LogicalPlan{color}{color:#000000}>{color}{color:#000000}
 
{color}{color:#000000}transform{color}{color:#000000}({color}{color:#000000}Any{color}{color:#000000}
 {color}{color:#000000}paramAny{color}{color:#000000},{color}{color:#000000} 
{color}{color:#000000}SparkConnectPlanner{color}{color:#000000} 
{color}{color:#000000}paramSparkConnectPlanner{color}{color:#000000}){color}{color:#000000};{color}
{color:#000000}}

Here's the bytecode
public abstract 
transform(Lorg/sparkproject/connect/protobuf/Any;Lorg/apache/spark/sql/connect/planner/SparkConnectPlanner;)Lscala/Option;


Here's the intellij reassembled class (Im guessing this is using the scala 
signature to inform reassembly but not sure)
trait RelationPlugin {
  def transform(relation: com.google.protobuf.Any, planner: 
SparkConnectPlanner): Option[LogicalPlan]
}
I'm a bit new to scala signatures but when I run ScalaSigParser on it, I see 
lots of references to com.google.protobuf.Any
40:    TypeRefType(ThisType(com.google.protobuf),com.google.protobuf.Any,List())
41:    ThisType(com.google.protobuf)
42:    com.google.protobuf

Basically this is presenting a challenge because at compile time, it seems like 
my class is being validated against the scala signature (which uses com.google) 
at compile time but at runtime its using the bytecode (which uses 
org.sparkproject.connect) so the interface is actually changing. A potential 
solution is to shade protobuf to the org.sparkproject.connect like [another 
maintainer did 
here|[https://github.com/SemyonSinchenko/tsumugi-spark/blob/ac95948d3be24508aa236927ddc379fd36708d14/tsumugi-server/pom.xml#L247]]
 but that seems error prone and I don't want to include the protobuf jar in my 
final output. 

I understand not fixing this since it seems like the interface is being changed 
in spark 4 but Im not sure how to handle this at runtime. Is the solution just 
to shade it myself so that it passes compile checks but then reflects the 
correct runtime signature like Semyon did?

Apologies if I'm creating a duplicate issue, I looked and couldn't find 
anything referencing this in the existing issues. This is my first issue so 
apologies if I've linked or set this up incorrectly{color}


> RelationPlugin scala signature does not match bytecode
> ------------------------------------------------------
>
>                 Key: SPARK-51343
>                 URL: https://issues.apache.org/jira/browse/SPARK-51343
>             Project: Spark
>          Issue Type: Bug
>          Components: Connect, Connect Contrib
>    Affects Versions: 3.5.4
>            Reporter: Ben Burnett
>            Priority: Minor
>
> I'm writing a dataframe plugin for spark connect to support functionality 
> that previously used py4j and it seems like the RelationPlugin class has 
> mismatched scala and java signatures in the binary. It seems like 
> `com.google.protobuf.Any` being shaded in the bytecode to 
> `org.sparkproject.connect.protobuf.Any` but remains as 
> `com.google.protobuf.Any` in the scala signature annotation.
> Here's the jd-gui bytecode reassembled
> public interface RelationPlugin {
>   Option<LogicalPlan> transform(Any paramAny, SparkConnectPlanner 
> paramSparkConnectPlanner);
> }
> Here's the bytecode
> public abstract 
> transform(Lorg/sparkproject/connect/protobuf/Any;Lorg/apache/spark/sql/connect/planner/SparkConnectPlanner;)Lscala/Option;
> Here's the intellij reassembled class (Im guessing this is using the scala 
> signature to inform reassembly but not sure)
> trait RelationPlugin {   
>     def transform(relation: com.google.protobuf.Any, planner: 
> SparkConnectPlanner): Option[LogicalPlan] 
> }
> I'm a bit new to scala signatures but when I run ScalaSigParser on it, I see 
> lots of references to com.google.protobuf.Any
> 40:    
> TypeRefType(ThisType(com.google.protobuf),com.google.protobuf.Any,List())
> 41:    ThisType(com.google.protobuf)
> 42:    com.google.protobuf
> Basically this is presenting a challenge because at compile time, it seems 
> like my class is being validated against the scala signature (which uses 
> com.google) at compile time but at runtime its using the bytecode (which uses 
> org.sparkproject.connect) so the interface is actually changing. A potential 
> solution is to shade protobuf to the org.sparkproject.connect like [another 
> maintainer did here|#L247]] but that seems error prone and I don't want to 
> include the protobuf jar in my final output.
> I understand not fixing this since it seems like the interface is being 
> changed in spark 4 but Im not sure how to handle this at runtime. Is the 
> solution just to shade it myself so that it passes compile checks but then 
> reflects the correct runtime signature like Semyon did?
> Apologies if I'm creating a duplicate issue, I looked and couldn't find 
> anything referencing this in the existing issues. This is my first issue so 
> apologies if I've linked or set this up incorrectly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-51343) RelationPlugin scala signature does not match bytecode

Reply via email to