Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Hyukjin Kwon
Here's a bit of history and context: The project was initially built using SBT ( https://github.com/apache/spark/commit/df29d0ea4c8b7137fdd1844219c7d489e3b0d9c9 ). Later, Maven support was added ( https://github.com/apache/spark/commit/811a32257b1b59b042a2871eede6ee39d9e8a137 ) to provide an alter

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Wenchen Fan
A slightly off-topic but related question: It feels fragile to test with SBT while publishing the release with Maven. How did we end up in this situation? Moreover, since most Spark developers use SBT for their daily work, it becomes even harder to catch issues with the Maven build. On Thu, Mar 27

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Hyukjin Kwon
Nah, I wasn't clear. Maven and SBT builds are synced for this special code path, e.g., https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0 . If you build Maven and SBT, the results are almost the same. Now, the fix you landed in Maven (and indeed it was a Maven specifi

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Rozov, Vlad
If it is not broken, can the sync between maven and SBT dependencies/shadow be done in a follow up PR? Thank you, Vlad On Mar 26, 2025, at 5:44 PM, Hyukjin Kwon wrote: It is not broken. The fix you applied would not be applied in SBT. For example, the lines you changed (added in https://git

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Rozov, Vlad
Sorry, but I still don’t follow. My PR broke Maven and the fix I provided fixes Maven. SBT was never broken except there is inconsistency between SBT and Maven builds. Can the inconsistency be fixed in a follow up PR? Thank you, Vlad On Mar 26, 2025, at 5:57 PM, Hyukjin Kwon wrote: It is not

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Hyukjin Kwon
It is not broken ... because we run SBT in PR builders for ASF resource restrictions and faster build. We use Maven for release so it was found out now. CI did not test your change. The part you are fixing is a special path .. On Thu, Mar 27, 2025 at 9:53 AM Rozov, Vlad wrote: > If it is not br

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Hyukjin Kwon
It is not broken. The fix you applied would not be applied in SBT. For example, the lines you changed (added in https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0 ): diff``` - - com.google.common - ${spark.shade.packageName}.connect.guava -

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Jungtaek Lim
+1 on explanation that it is not happening only to Vlad but always happening as a normal process. Vlad, if we are very strict about ASF voting policy, we have to have three +1s without -1 to merge the code change. I don't think the major projects in ASF follow it - instead, they (including Spark)

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Hyukjin Kwon
That only fixes Maven. Both SBT build and Maven build should work in the same or similar wat. Let's make sure both work. On Thu, Mar 27, 2025 at 3:18 AM Rozov, Vlad wrote: > Please see https://github.com/vrozov/spark/tree/spark-shell. I tested > only spark-shell —remote local after building with

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Rozov, Vlad
Every graduated from incubating Apache project has guards against what you name “chaotic” and what other name breaking best development practices. Such guards include JIRA, unit tests and PR review. Instead of reverting commit, I would expect you to open JIRA and outline what is broken. If you f

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Rozov, Vlad
Please see https://github.com/vrozov/spark/tree/spark-shell. I tested only spark-shell —remote local after building with maven and sbt. It may not be a complete fix and there is no PR. I’ll look into SBT build issue (assuming that there is still one after the fix) once you file JIRA. Thank you,

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Nicholas Chammas
On Thu, 27 Mar 2025 at 00:13, Rozov, Vlad wrote: > Every graduated from incubating Apache project has guards against what you > name “chaotic” and what other name breaking best development practices. Such > guards include JIRA, unit tests and PR review. Instead of reverting commit, I > would ex

performance issue Spark 3.5.2 on kubernetes

2025-03-26 Thread Prem Sahoo
Hello Team, I was working with Spark 3.2 and Hadoop 2.7.6 and writing to MinIO object storage . It was slower when compared to writing to MapR FS with the above tech stack. Then moved on to a later upgraded version of Spark 3.5.2 and Hadoop 4.3.1 which started writing to MinIO with V2 fileoutputcom

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Hyukjin Kwon
Vlad, - Please show me if there is a simple fix. If that's the case, yes, I will revert this out from the master branch. That works for me. - If not, let's make a new PR. - If you feel this is an issue, let's start a vote. Let me know. On Thu, 27 Mar 2025 at 00:13, Rozov, Vlad wrote: > Every g

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Reynold Xin
My advice to you Vlad is that it would be more fruitful to focus on fixing the issue than being extremely dogmatic and wasting everybody’s energy arguing about this. Of course, you are welcome to form your own opinion. On Wed, Mar 26, 2025 at 7:38 AM Rozov, Vlad wrote: > Reynold, I am not sure

Re: [DISCUSS] Upgrade Hive compile time dependency to 4.0

2025-03-26 Thread Rozov, Vlad
This is what my WIP PR targets. It will help to identify any compatibility or breaking issues with the new dependency. Thank you, Vlad On Mar 26, 2025, at 3:14 AM, Mich Talebzadeh wrote: Because of dependencies we need to ensure that the underlying artifacts (Hive 4.0.1) is also stable enoug

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Rozov, Vlad
Reynold, I am not sure I follow your question. I’ll open PR with the fix once JIRA is open. While I am new to the Spark community, I am not new to the Apache projects and open source. Committers are guardians for commits and they keep not only master branch, but the entire source code in shape

Re: [DISCUSS] Upgrade Hive compile time dependency to 4.0

2025-03-26 Thread Mich Talebzadeh
Because of dependencies we need to ensure that the underlying artifacts (Hive 4.0.1) is also stable enough. We should aim to establish that first and look for release timelines and where it fits cheers Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR v

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Hyukjin Kwon
Rozov, please test the patch, see if there is a relevant test or not, and add a test if not there. If it is difficult to add a test, describe it in the PR description, and how you manually tested. This is what I think you need to do instead of reverting the revert. Imagine that there are many of su