Re: [VOTE] SPIP: Constraints in DSv2

2025-03-25 Thread Chao Sun
+1 On Tue, Mar 25, 2025 at 10:22 PM Ángel Álvarez Pascua < angel.alvarez.pas...@gmail.com> wrote: > I meant ... a data validation API would be great, but why in the DSv2? > isn't data validation something more general? do we have to use DSv2 to > have our data validated? > > El mié, 26 mar 2025,

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-25 Thread Reynold Xin
Sorry Vlad - I disagree. Where is the simple fix? As a new contributor, you should not be coming in guns blazing blaming committers who are trying to keep the master branch sane and clean. On Tue, Mar 25, 2025 at 10:53 PM Rozov, Vlad wrote: > There is a simple fix. This is exactly what I outline

Re: [DISCUSS] Upgrade Hive compile time dependency to 4.0

2025-03-25 Thread Rozov, Vlad
I started working on it. See https://github.com/apache/spark/pull/50213. Review and comments on the PR will help a lot. +1 for 4.1. It won’t be ready for 4.0 and will require extensive testing. I have few more local changes that fixes some tests in sql/hive and should publish another revision s

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-25 Thread Rozov, Vlad
There is a simple fix. This is exactly what I outlined in the e-mail. Prior to reverting commit (on master) it was necessary to see if an easy fix exists. The PR that introduced the error was merged into master 3 weeks ago, so I still don’t get why it was reverted overnight. It was also necessar

Re: [DISCUSS] Upgrade Hive compile time dependency to 4.0

2025-03-25 Thread Wenchen Fan
I agree, 4.0 is already in the RC stage and I think it's too late to do such a big version bump for the Hive dependency. We definitely need to do this upgrade and thanks for working on it! On Mon, Mar 24, 2025 at 1:31 PM Ángel Álvarez Pascua < angel.alvarez.pas...@gmail.com> wrote: > That's grea

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-25 Thread Ángel Álvarez Pascua
I meant ... a data validation API would be great, but why in the DSv2? isn't data validation something more general? do we have to use DSv2 to have our data validated? El mié, 26 mar 2025, 6:15, Ángel Álvarez Pascua < angel.alvarez.pas...@gmail.com> escribió: > For me, data validation is one thi

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-25 Thread Ángel Álvarez Pascua
For me, data validation is one thing, and exporting that data to an external system is something entirely different. Should data validation be coupled with the external system? I don't think so. But since I'm the only one arguing against this proposal, does that mean I'm wrong? El mié, 26 mar 2025

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-25 Thread Hyukjin Kwon
With the change, the main entry points, Spark shalls, don't work and developers cannot debug and test. The snapshots become uesless. The tests passed because you did not fix SBT. It needs a larger change. Such change cannot be in the source. I can start a vote if you think this is an issue. On

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-25 Thread Reynold Xin
Is there a fix already available or a very simple fix a committer can create quickly? If yes, we can merge the fix. If there isn't, for major functionality breaking change, we should just revert. That's fairly basic software engineering practices. On Tue, Mar 25, 2025 at 9:53 PM Hyukjin Kwon wro

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-25 Thread Rozov, Vlad
This does not make any sense. 1. There are no broken tests introduced by https://github.com/apache/spark/pull/49971 2. There are no JIRA filed for “the main entry point” 3. “The main entry point” that does not have any unit test suggests that it is not the main entry point. 4. It is not practica

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-25 Thread Wenchen Fan
I’m glad we’ve found a short-term solution to unblock 4.0, but I’m still concerned about the long-term solution. It’s definitely better to fix these tests to generate jar files on the fly rather than relying on pre-compiled jars in the repo. However, these tests were added a long time ago, and the

Unsubscribe

2025-03-25 Thread Huibo Peng

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-25 Thread Hyukjin Kwon
I am confused. The consensus is made pretty clearly in https://github.com/apache/spark/pull/50378, CI passed. Now it has 9 +1s from all different groups. Why do we need to change the way? I don't think we should override the community consensus because you think the approach is hacky. On Wed, 26 M

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-25 Thread Rozov, Vlad
I think that there is some miscommunication/misunderstanding, so I’d like to clarify my view on the issue. 1. I don’t think there is a conflict. I think that overall almost all agree that having jar files in the Apache source release does not comply with the Apache release policy and they need

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-25 Thread Jungtaek Lim
Vlad, We are conflicted because you immediately want the project to fix the issue, while Dongjoon stated in the post that he does not want to block the release just because of this. We delayed the release of Apache Spark 4.0.0 a lot already (going to be month"s" now), and I do not want to see us e

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-25 Thread Hyukjin Kwon
> Yes, it removes jars from the source release and satisfies the ASF release policy (see item 3 in my e-mail). At the same time it makes source release different from the Github including release tag and I don’t think that in the long term this is the right approach. For the long term, we should re

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-25 Thread Hyukjin Kwon
Rozov, this broke the main entry points of release, Spark shells. Even in the mast branch, you build a Spark, and cannot use Spark shells. Why don't you submit a PR that contains the proper fix? It is easier to have one PR that has no issue, e.g., reverting backporting etc. On Wed, 26 Mar 2025 at

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-25 Thread Rozov, Vlad
Please see inline. Thank you, Vlad On Mar 25, 2025, at 1:42 PM, Hyukjin Kwon wrote: > - the approach encourages keeping jars files in the Apache Spark repo Yes, and removes it from source releases. I believe this is a minimized change with AS-IS? Yes, it removes jars from the source release a

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-25 Thread Hyukjin Kwon
> - the approach encourages keeping jars files in the Apache Spark repo Yes, and removes it from source releases. I believe this is a minimized change with AS-IS? > - it is hard to identify what tests are impacted by jars so they can be properly fixed We have a list of test jars, and I will add th

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-25 Thread Rozov, Vlad
The policy [1] is quite clear and the fact that other projects do not include compiled jars (including test jars) into the source release confirms the rule: "Every ASF release MUST contain one or more source packages, which MUST be sufficient for a user to build and test the release provided the

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-25 Thread Sean Owen
I personally think you are reading this too narrowly; the principle is, as given: "...MUST contain one or more source packages, which MUST be sufficient for a user to build and test the release..." "All releases are in the form of the source materials needed to make changes to the software being re

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-25 Thread Rozov, Vlad
I already casted my vote. To clarify, having compiled unlicensed jars in the source release is strictly against ASF policy [1]. Between a tiny chance that some tests and functionality will break and a small chance that ASF will request pull out of a long awaited release due to the policy violati

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-25 Thread Gengliang Wang
Hi Ángel, Thanks for the feedback. Besides the existing NOT NULL constraint, the proposal suggests enforcing only *check constraints *by default in Spark, as they’re straightforward and practical to validate at the engine level. Additionally, the SPIP proposes allowing connectors (like JDBC) to ha

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-25 Thread Holden Karau
So I think if I understand folks concerns it’s that we’ve let it slide in the past and at some point we’ve got to stop letting it slide because there is some concern we might not be meeting the ASF guidance here. Personally I think given they’re test artifacts and how delayed Spark 4 is we should

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-25 Thread Reynold Xin
While I'd love to resolve this issue, I still don't understand why we would block the release for this. On Tue, Mar 25, 2025 at 7:49 AM Rozov, Vlad wrote: > The difference is in the way how tests are disabled. > > - the approach encourages keeping jars files in the Apache Spark repo > - it is

Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-25 Thread Rozov, Vlad
Hi All, I kind of understand why https://github.com/apache/spark/pull/49971 was reverted on the branch-4.0 to allow testing of 4.0 release. Why was it also reverted on the master branch? I don’t see any JIRA being open for the failure. AFAIK, the proper way to handle the issue in Apache project

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-25 Thread Rozov, Vlad
The difference is in the way how tests are disabled. - the approach encourages keeping jars files in the Apache Spark repo - it is hard to identify what tests are impacted by jars so they can be properly fixed - the solution relies on jar being present or not present on the classpath. Tests may

Re: Spark 3.5.2 and Hadoop 3.4.1 slow performance

2025-03-25 Thread Prem Sahoo
Just one more variable is Spark 3.5.2 runs on kubernetes and Spark 3.2.0 runs on YARN . It seems kubernetes can be a cause of slowness too .Sent from my iPhoneOn Mar 24, 2025, at 7:10 PM, Prem Gmail wrote:Hello Spark Dev/users,Any one has any clue why and how a better version have performance iss

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-25 Thread Hyukjin Kwon
What's the difference between disabling tests for dev and release vs only for release? On Tue, 25 Mar 2025 at 15:36, Rozov, Vlad wrote: > Overall I don’t buy the solution where tests are skipped based on the > presence of a jar file. It looks too fragile to me. What if there is a bug > that does

Re: setuptools 78.0.0 does not work with pyspark 3.x releases

2025-03-25 Thread Hyukjin Kwon
Just fixed. Thanks guys for the quick fixes proposed. I woke up in my timezone, and went like wow :-). On Tue, 25 Mar 2025 at 05:33, Bjørn Jørgensen wrote: > https://setuptools.pypa.io/en/latest/history.html#v78-0-2 > > v78.0.2 > 24 Mar 2025 > > Bugfixes > Postponed removals of deprecated dash-s