Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-05-07 Thread Takuya UESHIN
Hi, we will have a PR to block pandas API on Spark on ANSI mode to avoid implicitly working unexpectedly, given that ANSI mode is enabled by default now but pandas API on Spark won't work properly on ANSI mode. - https://issues.apache.org/jira/browse/SPARK-52026 I'll submit a PR shortly. Thanks.

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-23 Thread Szehon Ho
One more small fix (on another topic) for the next RC: https://github.com/apache/spark/pull/50685 Thanks! Szehon On Tue, Apr 22, 2025 at 10:07 AM Rozov, Vlad wrote: > Correct, to me it looks like a Spark bug > https://issues.apache.org/jira/browse/SPARK-51821 that may be hard to > trigger and i

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-22 Thread Rozov, Vlad
Correct, to me it looks like a Spark bug https://issues.apache.org/jira/browse/SPARK-51821 that may be hard to trigger and is reproduce using the test case provided in https://github.com/apache/spark/pull/50594: 1. Spark UninterruptibleThread “task” is interrupted by “test” thread while “task”

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-22 Thread Wenchen Fan
Correct me if I'm wrong: this is a long-standing Spark bug that is very hard to trigger, but the new Parquet version happens to hit the trigger condition and exposes the bug. If this is the case, I'm +1 to fix the Spark bug instead of downgrading the Parquet version. Let's move the technical discu

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-21 Thread Manu Zhang
I don't think PARQUET-2432 has any issue itself. It looks to have triggered a deadlock case like https://github.com/apache/spark/pull/50594. I'd suggest that we fix forward if possible. Thanks, Manu On Mon, Apr 21, 2025 at 11:19 PM Rozov, Vlad wrote: > The deadlock is reproducible without Parqu

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-21 Thread Rozov, Vlad
The deadlock is reproducible without Parquet. Please see https://github.com/apache/spark/pull/50594. Thank you, Vlad On Apr 21, 2025, at 1:59 AM, Cheng Pan wrote: The deadlock is introduced by PARQUET-2432(1.14.0), if we decide downgrade, the latest workable version is Parquet 1.13.1. Thank

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-21 Thread Cheng Pan
The deadlock is introduced by PARQUET-2432(1.14.0), if we decide downgrade, the latest workable version is Parquet 1.13.1. Thanks, Cheng Pan > On Apr 21, 2025, at 16:53, Wenchen Fan wrote: > > +1 to downgrade to Parquet 1.15.0 for Spark 4.0. According to > https://github.com/apache/spark/pu

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-21 Thread Wenchen Fan
+1 to downgrade to Parquet 1.15.0 for Spark 4.0. According to https://github.com/apache/spark/pull/50583#issuecomment-2815243571 , the Parquet CVE does not affect Spark. On Mon, Apr 21, 2025 at 2:45 PM Hyukjin Kwon wrote: > That's nice but we need to wait for them to release, and upgrade right?

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-20 Thread Yuming Wang
It seems this patch(https://github.com/apache/parquet-java/pull/3196) can avoid deadlock issue if using Parquet 1.15.1. On Wed, Apr 16, 2025 at 5:39 PM Niranjan Jayakar wrote: > I found another bug introduced in 4.0 that breaks Spark connect client x > server compatibility: https://github.com/ap

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-20 Thread Hyukjin Kwon
That's nice but we need to wait for them to release, and upgrade right? Let's revert the parquet upgrade out of 4.0 branch since we're not directly affected by the CVE anyway. On Mon, 21 Apr 2025 at 15:42, Yuming Wang wrote: > It seems this patch(https://github.com/apache/parquet-java/pull/3196)

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-16 Thread Niranjan Jayakar
I found another bug introduced in 4.0 that breaks Spark connect client x server compatibility: https://github.com/apache/spark/pull/50604. Once merged, this should be included in the next RC. On Thu, Apr 10, 2025 at 5:21 PM Wenchen Fan wrote: > Please vote on releasing the following candidate a

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-15 Thread Rozov, Vlad
It may not be the Parquet introduced issue. It looks like a race condition between Spark UninterruptibleThread and Hadoop/HDFS DFSOutputStream. I tried to resolve the deadlock in https://github.com/apache/spark/pull/50594. Can you give it a try? I will see if I can reproduce the deadlock in a un

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-15 Thread Yuming Wang
This release uses Parquet 1.15.1. It seems Parquet 1.15.1 may cause deadlock. Found one Java-level deadlock: = "Executor 566 task launch worker for task 202024534, task 19644.1 in stage 13967543.0 of app application_1736396393732_100191": waiting to lock monitor 0

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-14 Thread Yuming Wang
I have reported this issue to the Parquet community: https://github.com/apache/parquet-java/issues/3193 On Tue, Apr 15, 2025 at 9:47 AM Wenchen Fan wrote: > Hi Yuming, > > 1.51.1 is the latest release of Apache Parquet for the 1.x line. Is it a > known issue the Parquet community is working on,

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-14 Thread Wenchen Fan
Hi Yuming, 1.51.1 is the latest release of Apache Parquet for the 1.x line. Is it a known issue the Parquet community is working on, or are you still investigating it? If the issue is confirmed by the Parquet community, we can probably roll back to the previous Parquet version for Spark 4.0. Than

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-13 Thread Hyukjin Kwon
Made a fix at https://github.com/apache/spark/pull/50575 👍 On Mon, 14 Apr 2025 at 11:42, Wenchen Fan wrote: > I'm testing the new spark-connect distribution and here is the result: > > 4 packages are tested: pip install pyspark, pip install pyspark_connect (I > installed them with the RC4 pyspar

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-13 Thread Wenchen Fan
I'm testing the new spark-connect distribution and here is the result: 4 packages are tested: pip install pyspark, pip install pyspark_connect (I installed them with the RC4 pyspark tarballs), the classic tarball (spark-4.0.0-bin-hadoop3.tgz), the connect tarball (spark-4.0.0-bin-hadoop3-spark-con