Re: [DISCUSS] Ongoing projects for Spark 4.0

2025-01-22 Thread Ángel
Hi, I’m working on a performance issue that ends up throwing an OutOfMemoryError when AQE is enabled. This problem was first identified by Russel Jurney while running GraphFrames unit tests, as detailed in his gist . The issue was a

Re: [DISCUSS] Ongoing projects for Spark 4.0

2025-01-22 Thread Mich Talebzadeh
Interesting points: client server architecture has been around since the days of Sybase. A client written in any language, say Python, Scala makes a request to spark cluster. This remote access model inherently creates a level of isolation between the client application and the internal workings of

Re: [DISCUSS] Ongoing projects for Spark 4.0

2025-01-22 Thread David Milicevic
Hi all, Together with my team, I'm working on adding support for SQL Scripting (JIRA , Ref Spec ). The feature is guarded b

RE: [DISCUSS] Ongoing projects for Spark 4.0

2025-01-22 Thread Stefan Kandic
Hi, I am working on adding collation support (https://issues.apache.org/jira/projects/SPARK/issues/SPARK-46830). Right now, collations are enabled by default as we have finished almost everything we planned to add. However, there are still some smaller things and improvements left that have on

RE: [DISCUSS] Ongoing projects for Spark 4.0

2025-01-22 Thread Milan Cupac
I am working on recursive CTEs. Two final PRs should be merged soon: https://github.com/apache/spark/pull/49518 https://github.com/apache/spark/pull/49571 2025/01/15 13:41:07 Wenchen Fan wrote: > Hi all, > > We have cut the "branch-4.0" and I'm sending this email to collect the > information

Re: How do I repackage org.spark-project.hive-exec-1.2.1.spark2

2025-01-22 Thread Mich Talebzadeh
Sorry I forgot to mention once you extract the JAR file, copy or symlink it to $SPARK_HOME/jars directory HTH Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR view my Linkedin profile On Tue

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-22 Thread Mich Talebzadeh
CI broken is really an operational aspect albeit in this case was quote temporary. We should put that aside and move on as 1) product is sound and 2) spark connect is strategic for the future of Spark. HTH Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-22 Thread Hyukjin Kwon
While it might be a bit too much to talk about its stability, it is true that the CI dedicated for Spark Connect compat was broken there for a couple of weeks, and the errors from the tests look confusing. I agree that tests and builds could be one of the easiest measurements to tell the state of a

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-22 Thread Martin Grund
I'm very confused about how we use stability in CI as a measure to discuss the strategy of a particular feature, particularly because we call these "hallucinations." >From real-world experience, I can say that we have thousands of clients using Spark Connect across many different versions in our i