Re: [DISCUSS] SPIP: Add the TIME data type

2025-02-17 Thread Max Gekk
Hello Mich, Thank you for the provided code, but it seems useless in the cases that I described above. No doubt that you can emulate the TIME type via STRING as well as other types. Let me highlight the cases when direct support of the new type by Spark SQL could be useful for users: 1. Load the T

SPARK-50994: Perform RDD conversion under tracked execution

2025-02-17 Thread Harsh Panchal
Hi, Sorry If I am being noisy, but I wanted to grab your attention at SPARK-50994 . It was raised because when `Dataset` is converted into `RDD`, It executes `SpakPlan` without any execution context. This leads to: 1. No tracking is available

Re: [DISCUSS] SPIP: Add the TIME data type

2025-02-17 Thread Subhasis Mukherjee
Implenting the type seems a good proposal to handle the mention use cases, mainly the migration of data. Many circuitous code can be written to handle such scenario, but nothing beats a straightforward type implementation IMO. Thanks, Subhasis Mukherjee On Mon, Feb 17, 2025, 9:37 PM Max Gekk wro

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-17 Thread Wenchen Fan
It’s unfortunate that we missed identifying these issues during the code review. However, since they have already been released, I believe deprecating them is a better approach than removing them, as the latter would introduce a breaking change. Regarding Jungtaek’s PR

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-17 Thread Dongjoon Hyun
For Spark 3.5.5, did you see this which is the best the community offer? https://github.com/apache/spark/pull/49985 [SPARK-51187][SQL][SS][3.5] Implement the graceful deprecation of incorrect config introduced in SPARK-49699 Dongjoon On Mon, Feb 17, 2025 at 14:38 Bjørn Jørgensen wrote: > > Hav

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-17 Thread Bjørn Jørgensen
Having breaking changes in a minor seems not that good.. As I'm reading this, "*This could break the query if the rule impacts the query, because the effectiveness of the fix is flipped.*" https://github.com/apache/spark/pull/49897#issuecomment-2652567140 What if we have this https://github.com/

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-17 Thread Jungtaek Lim
I think I can add a color to minimize the concern. The problematic config we added is arguably not user facing. I'd argue moderate users wouldn't even understand what the flag is doing. The config was added because Structured Streaming has been leveraging SQL config to "do the magic" on having two

Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-17 Thread Dongjoon Hyun
Hi, All. I'd like to highlight this discussion because this is more important and tricky in a way. As already mentioned in the mailing list and PRs, there was an obvious mistake which missed an improper configuration name, `spark.databricks.*`. https://github.com/apache/spark/blob/a6f220d951742f