Dear community, I am working on a popular open-source connector that provides a custom Data Source V2 Strategy which is providing a useful planning extension to Spark, yet I can't seem to reconcile the API updates in spark 4 in relation to adding extensions.
We add a custom planner strategy that utilises the data source v2 strategy API<https://github.com/apache/spark/blob/59e6b5b7d350a1603502bc92e3c117311ab2cbb6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala#L56>. The DSv2 API requires a type of session of the type <https://github.com/apache/spark/blob/59e6b5b7d350a1603502bc92e3c117311ab2cbb6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala#L56> org.apache.spark.sql.classic.SparkSession<https://github.com/apache/spark/blob/59e6b5b7d350a1603502bc92e3c117311ab2cbb6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala#L56>, but from what I can gather, any strategy builder must implement SparkSession => Strategy<https://github.com/apache/spark/blob/59e6b5b7d350a1603502bc92e3c117311ab2cbb6/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala#L112>, where that SparkSession is of type org.apache.spark.sql.SparkSession<https://github.com/apache/spark/blob/655061aaf728231bdd881edbb721202c4a618fb8/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala#L75>, and not of type org.apache.spark.sql.classic.SparkSession<https://github.com/apache/spark/blob/59e6b5b7d350a1603502bc92e3c117311ab2cbb6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala#L56>. My primary question is how should we handle receiving a handle to SparkSession (org.apache.spark.sql.SparkSession), while needing to also provide a handle to the classic SparkSession (org.apache.spark.sql.classic.SparkSession) so we can construct a dsv2 strategy<https://github.com/apache/spark/blob/59e6b5b7d350a1603502bc92e3c117311ab2cbb6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala#L56>? It's unclear if I should be refactoring/moving away from DS V2, or that extensions of this type should no longer supported (using DSV2) due to the extension builder only providing the type (org.apache.spark.sql.SparkSession). I am curious, where I'm still learning more about the changes/intentions in v4: * How is it intended we reconcile an extension utilising DSV2/org.apache.spark.sql.classic.SparkSession? * What are the recommended alternatives, is there something I should look at implementing instead of using datasource v2? Are there newer API's that will have better compatibility for spark 4 and use cases I've likely not yet considered, eg. relating to connect? I'm happy to push a branch up to git if the explanation is not clear/an example is needed. Kind Regards, Jack Buggins Software Engineer Unless otherwise stated above: IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: Building C, IBM Hursley Office, Hursley Park Road, Winchester, Hampshire SO21 2JN