Re: Plans for built-in v2 data sources in Spark 4

2023-09-20 Thread Will Raschkowski
Thank you for linking that, Dongjoon! I found SPARK-44518 in that list which wants to turn Spark’s Hive integration into a data source. IIUC, that’s very related but I’m curious if I’m thinking about this correctly: Big gaps between built-in v1

Re: Plans for built-in v2 data sources in Spark 4

2023-09-20 Thread Will Raschkowski
Thank you for linking that, Dongjoon! I found SPARK-44518 in that list which wants to turn Spark’s Hive integration into a data source. To think out loud: The big gaps between built-in v1 and v2 data sources are support for bucketing and partit

Re: Plans for built-in v2 data sources in Spark 4

2023-09-20 Thread Dongjoon Hyun
Instead of that, I believe you are looking for `spark.sql.sources.useV1SourceList` if the question is about "Concretely, is the plan for Spark 4 to continue defaulting to the built-in v1 data sources?". Here is the code. https://github.com/apache/spark/blob/324a07b534ac8c2e83a50ac5ea4c5d93fd57b79

[DISCUSS] Porting back SPARK-45178 to 3.5/3.4 version lines

2023-09-20 Thread Jungtaek Lim
Hi devs, I'd like to get some inputs for dealing with the possible correctness issue we figured. The JIRA ticket is SPARK-45178 and I described the issue and solution I proposed. Context: Source might behave incorrectly leading to correctness is