Dear community,

I am working on a popular open-source connector that provides a custom Data 
Source V2 Strategy which is providing a useful planning extension to Spark, yet 
I can't seem to reconcile the API updates in spark 4 in relation to adding 
extensions.

We add a custom planner strategy that utilises the data source v2 strategy 
API<https://github.com/apache/spark/blob/59e6b5b7d350a1603502bc92e3c117311ab2cbb6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala#L56>.

The DSv2 API requires a type of session of the type 
<https://github.com/apache/spark/blob/59e6b5b7d350a1603502bc92e3c117311ab2cbb6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala#L56>
 
org.apache.spark.sql.classic.SparkSession<https://github.com/apache/spark/blob/59e6b5b7d350a1603502bc92e3c117311ab2cbb6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala#L56>,
 but from what I can gather, any strategy builder must implement SparkSession 
=> 
Strategy<https://github.com/apache/spark/blob/59e6b5b7d350a1603502bc92e3c117311ab2cbb6/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala#L112>,
 where that SparkSession is of type 
org.apache.spark.sql.SparkSession<https://github.com/apache/spark/blob/655061aaf728231bdd881edbb721202c4a618fb8/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala#L75>,
 and not of type 
org.apache.spark.sql.classic.SparkSession<https://github.com/apache/spark/blob/59e6b5b7d350a1603502bc92e3c117311ab2cbb6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala#L56>.

My primary question is how should we handle receiving a handle to SparkSession 
(org.apache.spark.sql.SparkSession), while needing to also provide a handle to 
the classic SparkSession (org.apache.spark.sql.classic.SparkSession) so we can 
construct a dsv2 
strategy<https://github.com/apache/spark/blob/59e6b5b7d350a1603502bc92e3c117311ab2cbb6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala#L56>?

It's unclear if I should be refactoring/moving away from DS V2, or that 
extensions of this type should no longer supported (using DSV2) due to the 
extension builder only providing the type (org.apache.spark.sql.SparkSession).

I am curious, where I'm still learning more about the changes/intentions in v4:


  *
How is it intended we reconcile an extension utilising 
DSV2/org.apache.spark.sql.classic.SparkSession?


  *
What are the recommended alternatives, is there something I should look at 
implementing instead of using datasource v2? Are there newer API's that will 
have better compatibility for spark 4 and use cases I've likely not yet 
considered, eg. relating to connect?



I'm happy to push a branch up to git if the explanation is not clear/an example 
is needed.

Kind Regards,

Jack Buggins
Software Engineer




Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: Building C, IBM Hursley Office, Hursley Park Road, 
Winchester, Hampshire SO21 2JN

Reply via email to