Re: Spark 3.5.2 and Hadoop 3.4.1 slow performance

2025-03-24 Thread Ángel Álvarez Pascua
@Prem Sahoo , could you test both versions of Spark+Hadoop by replacing your "write to MinIO" statement with write.format("noop")? This would help us determine whether the issue lies on the reader side or the writer side. El dom, 23 mar 2025 a las 4:53, Prem Gmail () escribió: > V2 writer in 3.

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-24 Thread Rozov, Vlad
Overall I don’t buy the solution where tests are skipped based on the presence of a jar file. It looks too fragile to me. What if there is a bug that does not add jar to a classpath? The test would be skipped, but not because jar was deleted, but because classpath is incorrect. Thank you, Vlad

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-24 Thread Rozov, Vlad
You can not remove test jars without disabling tests as test would fail when running against the official source release. Thank you, Vlad On Mar 24, 2025, at 7:32 PM, Hyukjin Kwon wrote: Made a PR first (https://github.com/apache/spark/pull/50378). BTW, I agree that we should have the source

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-24 Thread Hyukjin Kwon
Valid concern. Maybe we can mark tests ignored when those tests do not exist for now. So tagged commit will skip those tests. Dev commits will still test them. On Tue, 25 Mar 2025 at 11:47, Jungtaek Lim wrote: > Maybe we should also check that it is mandatory for source code being > distributed

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-24 Thread Jungtaek Lim
Maybe we should also check that it is mandatory for source code being distributed under release to be able to pass the test suites? If this is mandatory, we can't just modify the release script to simply remove the jars, because this will break the tests in source code distribution. Actually this

Unsubscribe

2025-03-24 Thread Jaskaran singh

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-24 Thread Hyukjin Kwon
Made a PR first (https://github.com/apache/spark/pull/50378). BTW, I agree that we should have the source code along with the jars, and ideally the dev branch should not contain them as well. This is a technical depth. For this, I hope we can improve this incrementally. I will also take a look an

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-24 Thread Hyukjin Kwon
So the issues are source releases (https://github.com/apache/spark/tags) containing those jars, right? Can we add the removal of test jars at the part of the release process. They aren't included in binary releases in any event so removal on every source release should work. On Tue, 25 Mar 2025 a

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-24 Thread Jungtaek Lim
Let's make this very clear - do we not have a source code to build a jar, or have no way to infer the source code being used for the jar? I understand the concern, but if this is a huge issue, why no one has looked into this and here we just debate whether the affected tests need to be dropped/dis

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-24 Thread Rozov, Vlad
First of all I don’t think that conclusion on the https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k is correct. Jar files included into the source release are compiled from the code and replacing them with dat or jpeg files won’t work. Including jar files into the source release

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-24 Thread Hyukjin Kwon
I still disagree with just disabling tests and removing the jars without making sure that we will enable them back. I want to EITHER make sure we have a plan and someone to drive, and the tests will be enabled back, OR have a one fix that does all. Otherwise, my -1 stands if we can't be sure of tha

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-24 Thread Hyukjin Kwon
>From what I read in the last discussion in the legal thread ( https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k), we don't really need to rush and block the release. I don't think we should block the release, remove the CI, and just remove the jars. Rozov, the original proposal of

Re: setuptools 78.0.0 does not work with pyspark 3.x releases

2025-03-24 Thread Bjørn Jørgensen
https://setuptools.pypa.io/en/latest/history.html#v78-0-2 v78.0.2 24 Mar 2025 Bugfixes Postponed removals of deprecated dash-separated and uppercase fields in setup.cfg. All packages with deprecated configurations are advised to move before 2026. (#4911

Re: Spark 3.5.2 and Hadoop 3.4.1 slow performance

2025-03-24 Thread Prem Sahoo
The problem is on the writer's side. It takes longer to write to Minio with Spark 3.5.2 and Hadoop 3.4.1 . so it seems there are some tech changes between hadoop 2.7.6 to 3.4.1 which made the write process faster. On Sun, Mar 23, 2025 at 12:09 AM Ángel Álvarez Pascua < angel.alvarez.pas...@gmail.c

Re: setuptools 78.0.0 does not work with pyspark 3.x releases

2025-03-24 Thread James Willis
Perhaps it is sufficient to wait for setuptools to revert the change: https://github.com/pypa/setuptools/pull/4911 On Mon, Mar 24, 2025 at 11:38 AM Holden Karau wrote: > I think given the lack of 4.0 release and the amount of folks using > PySpark this is enough to trigger a 3.5 branch release.

Re: setuptools 78.0.0 does not work with pyspark 3.x releases

2025-03-24 Thread Holden Karau
I think given the lack of 4.0 release and the amount of folks using PySpark this is enough to trigger a 3.5 branch release. Twitter: https://twitter.com/holdenkarau Fight Health Insurance: https://www.fighthealthinsurance.com/ Books (Learning Spar

Re: Re: [Discuss] SPIP: Support NanoSecond Timestamps

2025-03-24 Thread Qi Tan
Hello team, I have already updated the google doc: https://docs.google.com/document/d/1wjFsBdlV2YK75x7UOk2HhDOqWVA0yC7iEiqOMnNnxlA/edit?usp=sharing. If all looks good, I will raise a vote later this week. Thank you! Qi Tan 于2025年3月18日周二 21:39写道: > Hello Reynold, I truly appreciate your time and

setuptools 78.0.0 does not work with pyspark 3.x releases

2025-03-24 Thread Sean Owen
I think we're about to hear about this: setuptools 78.0.0, released yesterday, no longer allows dashes in keys in setup.cfg: https://setuptools.pypa.io/en/stable/history.html#v78-0-0 The pyspark packaging has 'description-file' instead of 'description_file' in its setup.cfg, and so will not insta

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-24 Thread Rozov, Vlad
Let’s open a formal vote on the subject. I have open WIP PR https://github.com/apache/spark/pull/50231 that is currently blocked by -1. Thank you, Vlad On Mar 24, 2025, at 7:05 AM, Wenchen Fan wrote: It seems there’s no quick fix for this issue. Should we remove these jars and disable the te

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-24 Thread Wenchen Fan
It seems there’s no quick fix for this issue. Should we remove these jars and disable the tests for now to comply with ASF policy? While this would temporarily reduce test coverage until we refactor the tests to avoid pre-compiled jars, we can encourage Spark vendors not to cherry-pick this test-di

Re: [VOTE] Release Spark 4.0.0 (RC3)

2025-03-24 Thread Hyukjin Kwon
-1. Scala and PySpark shell are broken by https://github.com/apache/spark/pull/49971. Reverted it for now. On Mon, 24 Mar 2025 at 12:36, Yang Jie wrote: > -1, > > The pull request at https://github.com/apache/spark/pull/49604 introduced > a connection-related example module and successfully mer

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-24 Thread Hyukjin Kwon
+1 On Mon, 24 Mar 2025 at 09:57, Jungtaek Lim wrote: > +1 (non-binding) > > Thanks for initiating this! > > On Sun, Mar 23, 2025 at 3:45 AM serge rielau.com wrote: > >> +1 (non binding) >> >> On Mar 21, 2025, at 12:52 PM, Jules Damji wrote: >> >> +1 (non-binding) >> — >> Sent from my iPhone >>