Hi, we will have a PR to block pandas API on Spark on ANSI mode to avoid
implicitly working unexpectedly, given that ANSI mode is enabled by default
now but pandas API on Spark won't work properly on ANSI mode.
- https://issues.apache.org/jira/browse/SPARK-52026
I'll submit a PR shortly.
Thanks.
One more small fix (on another topic) for the next RC:
https://github.com/apache/spark/pull/50685
Thanks!
Szehon
On Tue, Apr 22, 2025 at 10:07 AM Rozov, Vlad
wrote:
> Correct, to me it looks like a Spark bug
> https://issues.apache.org/jira/browse/SPARK-51821 that may be hard to
> trigger and i
Correct, to me it looks like a Spark bug
https://issues.apache.org/jira/browse/SPARK-51821 that may be hard to trigger
and is reproduce using the test case provided in
https://github.com/apache/spark/pull/50594:
1. Spark UninterruptibleThread “task” is interrupted by “test” thread while
“task”
Correct me if I'm wrong: this is a long-standing Spark bug that is very
hard to trigger, but the new Parquet version happens to hit the trigger
condition and exposes the bug. If this is the case, I'm +1 to fix the Spark
bug instead of downgrading the Parquet version.
Let's move the technical discu
I don't think PARQUET-2432 has any issue itself. It looks to have triggered
a deadlock case like https://github.com/apache/spark/pull/50594.
I'd suggest that we fix forward if possible.
Thanks,
Manu
On Mon, Apr 21, 2025 at 11:19 PM Rozov, Vlad
wrote:
> The deadlock is reproducible without Parqu
The deadlock is reproducible without Parquet. Please see
https://github.com/apache/spark/pull/50594.
Thank you,
Vlad
On Apr 21, 2025, at 1:59 AM, Cheng Pan wrote:
The deadlock is introduced by PARQUET-2432(1.14.0), if we decide downgrade, the
latest workable version is Parquet 1.13.1.
Thank
The deadlock is introduced by PARQUET-2432(1.14.0), if we decide downgrade, the
latest workable version is Parquet 1.13.1.
Thanks,
Cheng Pan
> On Apr 21, 2025, at 16:53, Wenchen Fan wrote:
>
> +1 to downgrade to Parquet 1.15.0 for Spark 4.0. According to
> https://github.com/apache/spark/pu
+1 to downgrade to Parquet 1.15.0 for Spark 4.0. According to
https://github.com/apache/spark/pull/50583#issuecomment-2815243571 , the
Parquet CVE does not affect Spark.
On Mon, Apr 21, 2025 at 2:45 PM Hyukjin Kwon wrote:
> That's nice but we need to wait for them to release, and upgrade right?
It seems this patch(https://github.com/apache/parquet-java/pull/3196) can
avoid deadlock issue if using Parquet 1.15.1.
On Wed, Apr 16, 2025 at 5:39 PM Niranjan Jayakar
wrote:
> I found another bug introduced in 4.0 that breaks Spark connect client x
> server compatibility: https://github.com/ap
That's nice but we need to wait for them to release, and upgrade right?
Let's revert the parquet upgrade out of 4.0 branch since we're not directly
affected by the CVE anyway.
On Mon, 21 Apr 2025 at 15:42, Yuming Wang wrote:
> It seems this patch(https://github.com/apache/parquet-java/pull/3196)
I found another bug introduced in 4.0 that breaks Spark connect client x
server compatibility: https://github.com/apache/spark/pull/50604.
Once merged, this should be included in the next RC.
On Thu, Apr 10, 2025 at 5:21 PM Wenchen Fan wrote:
> Please vote on releasing the following candidate a
It may not be the Parquet introduced issue. It looks like a race condition
between Spark UninterruptibleThread and Hadoop/HDFS DFSOutputStream. I tried to
resolve the deadlock in https://github.com/apache/spark/pull/50594. Can you
give it a try? I will see if I can reproduce the deadlock in a un
This release uses Parquet 1.15.1. It seems Parquet 1.15.1 may cause
deadlock.
Found one Java-level deadlock:
=
"Executor 566 task launch worker for task 202024534, task 19644.1 in stage
13967543.0 of app application_1736396393732_100191":
waiting to lock monitor 0
I have reported this issue to the Parquet community:
https://github.com/apache/parquet-java/issues/3193
On Tue, Apr 15, 2025 at 9:47 AM Wenchen Fan wrote:
> Hi Yuming,
>
> 1.51.1 is the latest release of Apache Parquet for the 1.x line. Is it a
> known issue the Parquet community is working on,
Hi Yuming,
1.51.1 is the latest release of Apache Parquet for the 1.x line. Is it a
known issue the Parquet community is working on, or are you still
investigating it? If the issue is confirmed by the Parquet community, we
can probably roll back to the previous Parquet version for Spark 4.0.
Than
Made a fix at https://github.com/apache/spark/pull/50575 👍
On Mon, 14 Apr 2025 at 11:42, Wenchen Fan wrote:
> I'm testing the new spark-connect distribution and here is the result:
>
> 4 packages are tested: pip install pyspark, pip install pyspark_connect (I
> installed them with the RC4 pyspar
I'm testing the new spark-connect distribution and here is the result:
4 packages are tested: pip install pyspark, pip install pyspark_connect (I
installed them with the RC4 pyspark tarballs), the classic tarball
(spark-4.0.0-bin-hadoop3.tgz), the connect tarball
(spark-4.0.0-bin-hadoop3-spark-con
17 matches
Mail list logo