Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Hyukjin Kwon
Vlad,

- Please show me if there is a simple fix. If that's the case, yes, I will
revert this out from the master branch. That works for me.
- If not, let's make a new PR.
- If you feel this is an issue, let's start a vote. Let me know.


On Thu, 27 Mar 2025 at 00:13, Rozov, Vlad  wrote:

> Every graduated from incubating Apache project has guards against what you
> name “chaotic” and what other name breaking best development practices.
> Such guards include JIRA, unit tests and PR review. Instead of reverting
> commit, I would expect you to open JIRA and outline what is broken. If you
> filed JIRA, please provide the link, if not, please open one.
>
> It took me 2 hours to fix Spark shells, so should you open JIRA instead of
> spending time to identify the commit and reverting it, you will save time
> as well. I’ll post fix once JIRA is open and I validate that my
> understanding of what is broken is the same as yours.
>
> Thank you,
>
> Vlad
>
> On Mar 26, 2025, at 12:01 AM, Hyukjin Kwon  wrote:
>
> Rozov, please test the patch, see if there is a relevant test or not, and
> add a test if not there. If it is difficult to add a test, describe it in
> the PR description, and how you manually tested.
> This is what I think you need to do instead of reverting the revert.
> Imagine that there are many of such PRs living in the master branch.
> Thankfully this time you are alone but imagine many people do the same
> things - it will be chaotic. To me, it took 4 hours to identify this single
> commit.
>
> About the change itself, you need to fix SBT together, and it will need
> the changes such as
> https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0
> .
> Should also test Spark shells, and describe how you tested it as well.
>
> This is what I expect:
> - Please show me if there is a simple fix. If that's the case, yes, I will
> revert this out from the master branch. That works for me.
> - If not, let's make a new PR.
> - If you feel this is an issue, let's start a vote. Let me know.
>
>
>
> On Wed, 26 Mar 2025 at 15:07, Reynold Xin 
> wrote:
>
>> Sorry Vlad - I disagree. Where is the simple fix? As a new contributor,
>> you should not be coming in guns blazing blaming committers who are trying
>> to keep the master branch sane and clean.
>>
>> On Tue, Mar 25, 2025 at 10:53 PM Rozov, Vlad 
>> wrote:
>>
>>> There is a simple fix. This is exactly what I outlined in the e-mail.
>>> Prior to reverting commit (on master) it was necessary to see if an easy
>>> fix exists. The PR that introduced the error was merged into master 3
>>> weeks ago, so I still don’t get why it was reverted overnight. It was
>>> also necessary to open JIRA and provide reproducible steps. I assume that
>>> steps on the PR is what needs to be fixed, but it will be better to avoid
>>> guessing.
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>> On Mar 25, 2025, at 10:05 PM, Reynold Xin 
>>> wrote:
>>>
>>> Is there a fix already available or a very simple fix a committer can
>>> create quickly? If yes, we can merge the fix. If there isn't, for major
>>> functionality breaking change, we should just revert. That's fairly basic
>>> software engineering practices.
>>>
>>>
>>> On Tue, Mar 25, 2025 at 9:53 PM Hyukjin Kwon 
>>> wrote:
>>>
 With the change, the main entry points, Spark shalls, don't work and
 developers cannot debug and test. The snapshots become uesless.

 The tests passed because you did not fix SBT. It needs a larger change.

 Such change cannot be in the source. I can start a vote if you think
 this is an issue.


 On Wed, Mar 26, 2025 at 12:32 PM Rozov, Vlad 
 wrote:

> This does not make any sense.
>
> 1. There are no broken tests introduced by
> https://github.com/apache/spark/pull/49971
> 2. There are no JIRA filed for “the main entry point”
> 3. “The main entry point” that does not have any unit test suggests
> that it is not the main entry point.
> 4. It is not practical to revert every commit that introduced some
> functional issue or regression. If regression can’t be fixed, it is
> reverted, if it can be reasonably fixed, the fix is applied to the
> regression. Just as an example,
> https://github.com/apache/spark/pull/42823 introduced a regression.
> It was fixed by https://github.com/apache/spark/pull/50348. Nobody
> asked for the revert.
> 5. The way how revert was handled is not "Apache way”.
>
> It may be easier for you to handle it in a single PR, but it is easier
> for me to provide PR on top of existing changes and it will also provide
> JIRA and a commit that documents why there is an issue. Otherwise it will
> be easier to introduce the error back again.
>
> Thank you,
>
> Vlad
>
>
> On Mar 25, 2025, at 4:58 PM, Hyukjin Kwon 
> wrote:
>
> Rozov, this broke the main entry points of release, Spark shells. Even
> in the mast bra

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Rozov, Vlad
Please see https://github.com/vrozov/spark/tree/spark-shell. I tested only 
spark-shell —remote local after building with maven and sbt. It may not be a 
complete fix and there is no PR. I’ll look into SBT build issue (assuming that 
there is still one after the fix) once you file JIRA.

Thank you,

Vlad

On Mar 26, 2025, at 8:27 AM, Hyukjin Kwon  wrote:

Vlad,

- Please show me if there is a simple fix. If that's the case, yes, I will 
revert this out from the master branch. That works for me.
- If not, let's make a new PR.
- If you feel this is an issue, let's start a vote. Let me know.


On Thu, 27 Mar 2025 at 00:13, Rozov, Vlad  wrote:
Every graduated from incubating Apache project has guards against what you name 
“chaotic” and what other name breaking best development practices. Such guards 
include JIRA, unit tests and PR review. Instead of reverting commit, I would 
expect you to open JIRA and outline what is broken. If you filed JIRA, please 
provide the link, if not, please open one.

It took me 2 hours to fix Spark shells, so should you open JIRA instead of 
spending time to identify the commit and reverting it, you will save time as 
well. I’ll post fix once JIRA is open and I validate that my understanding of 
what is broken is the same as yours.

Thank you,

Vlad

On Mar 26, 2025, at 12:01 AM, Hyukjin Kwon 
mailto:gurwls...@apache.org>> wrote:

Rozov, please test the patch, see if there is a relevant test or not, and add a 
test if not there. If it is difficult to add a test, describe it in the PR 
description, and how you manually tested.
This is what I think you need to do instead of reverting the revert.
Imagine that there are many of such PRs living in the master branch. Thankfully 
this time you are alone but imagine many people do the same things - it will be 
chaotic. To me, it took 4 hours to identify this single commit.

About the change itself, you need to fix SBT together, and it will need the 
changes such as 
https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0.
Should also test Spark shells, and describe how you tested it as well.

This is what I expect:
- Please show me if there is a simple fix. If that's the case, yes, I will 
revert this out from the master branch. That works for me.
- If not, let's make a new PR.
- If you feel this is an issue, let's start a vote. Let me know.



On Wed, 26 Mar 2025 at 15:07, Reynold Xin  wrote:
Sorry Vlad - I disagree. Where is the simple fix? As a new contributor, you 
should not be coming in guns blazing blaming committers who are trying to keep 
the master branch sane and clean.

On Tue, Mar 25, 2025 at 10:53 PM Rozov, Vlad  wrote:
There is a simple fix. This is exactly what I outlined in the e-mail. Prior to 
reverting commit (on master) it was necessary to see if an easy fix exists. The 
PR that introduced the error was merged into master 3 weeks ago, so I still 
don’t get why it was reverted overnight. It was also necessary to open JIRA and 
provide reproducible steps. I assume that steps on the PR is what needs to be 
fixed, but it will be better to avoid guessing.

Thank you,

Vlad

On Mar 25, 2025, at 10:05 PM, Reynold Xin  wrote:

Is there a fix already available or a very simple fix a committer can create 
quickly? If yes, we can merge the fix. If there isn't, for major functionality 
breaking change, we should just revert. That's fairly basic software 
engineering practices.


On Tue, Mar 25, 2025 at 9:53 PM Hyukjin Kwon 
mailto:gurwls...@apache.org>> wrote:
With the change, the main entry points, Spark shalls, don't work and developers 
cannot debug and test. The snapshots become uesless.

The tests passed because you did not fix SBT. It needs a larger change.

Such change cannot be in the source. I can start a vote if you think this is an 
issue.


On Wed, Mar 26, 2025 at 12:32 PM Rozov, Vlad  wrote:
This does not make any sense.

1. There are no broken tests introduced by 
https://github.com/apache/spark/pull/49971
2. There are no JIRA filed for “the main entry point”
3. “The main entry point” that does not have any unit test suggests that it is 
not the main entry point.
4. It is not practical to revert every commit that introduced some functional 
issue or regression. If regression can’t be fixed, it is reverted, if it can be 
reasonably fixed, the fix is applied to the regression. Just as an example, 
https://github.com/apache/spark/pull/42823 introduced a regression. It was 
fixed by https://github.com/apache/spark/pull/50348. Nobody asked for the 
revert.
5. The way how revert was handled is not "Apache way”.

It may be easier for you to handle it in a single PR, but it is easier for me 
to provide PR on top of existing changes and it will also provide JIRA and a 
commit that documents why there is an issue. Otherwise it will be easier to 
introduce the error back again.

Thank you,

Vlad


On Mar 25, 2025, at 4:58 PM, Hyukjin Kwon 
mailto:gurwls...@apache.org>> wrote:

Rozov, this broke the ma

performance issue Spark 3.5.2 on kubernetes

2025-03-26 Thread Prem Sahoo
Hello Team,
I was working with Spark 3.2 and Hadoop 2.7.6 and writing to MinIO object
storage . It was slower when compared to writing to MapR FS with the above
tech stack. Then moved on to a later upgraded version of Spark 3.5.2 and
Hadoop 4.3.1 which started writing to MinIO with V2 fileoutputcommitter and
checked the performance which is worse than old tech stack. Then tried
using magic committer and it came out slower than V2 so with the latest
tech stack the performance is downgraded. With Spark 3.2 and Hadoop 2.7.6 I
was using Spark StandAlone cluster but with Spark 3.5.2 and Hadoop 3.4.1
used Kuberentes as cluster master.


Could some please assist .


Re: [DISCUSS] Upgrade Hive compile time dependency to 4.0

2025-03-26 Thread Rozov, Vlad
This is what my WIP PR targets. It will help to identify any compatibility or 
breaking issues with the new dependency.

Thank you,

Vlad

On Mar 26, 2025, at 3:14 AM, Mich Talebzadeh  wrote:

Because of dependencies we need to ensure that the underlying artifacts (Hive 
4.0.1) is also stable enough. We should aim to establish that first and look 
for release timelines and where it fits

cheers

Dr Mich Talebzadeh,
Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

 
[https://ci3.googleusercontent.com/mail-sig/AIorK4zholKucR2Q9yMrKbHNn-o1TuS4mYXyi2KO6Xmx6ikHPySa9MLaLZ8t2hrA6AUcxSxDgHIwmKE]
   view my Linkedin 
profile




On Wed, 26 Mar 2025 at 06:02, Rozov, Vlad  wrote:
I started working on it. See https://github.com/apache/spark/pull/50213. Review 
and comments on the PR will help a lot.

+1 for 4.1. It won’t be ready for 4.0 and will require extensive testing.

I have few more local changes that fixes some tests in sql/hive and should 
publish another revision soon.

Thank you,

Vlad


On Mar 25, 2025, at 10:12 PM, Wenchen Fan 
mailto:cloud0...@gmail.com>> wrote:

I agree, 4.0 is already in the RC stage and I think it's too late to do such a 
big version bump for the Hive dependency.

We definitely need to do this upgrade and thanks for working on it!

On Mon, Mar 24, 2025 at 1:31 PM Ángel Álvarez Pascua 
mailto:angel.alvarez.pas...@gmail.com>> wrote:

That's great news! If you need anything from me, just ask...

We should also check and update other non-Hive third-party libraries with 
high/critical vulnerabilities, as someone mentioned in another email thread.

Since this is a major change, I think we should leave it for Spark 4.1. What do 
you think?

El lun, 24 mar 2025 a las 0:59, Mich Talebzadeh 
(mailto:mich.talebza...@gmail.com>>) escribió:
For now, I am testing apache-hive-4.0.1-bin which is the latest release version 
 from

https://dlcdn.apache.org/hive/hive-4.0.1/

apache-hive-4.0.1-bin.tar

My metastore is Oracle and upgrade scripts are provided..

My previous version is Hive 3.1.1 and the metastore upgrade went OK without any 
major headache.

Now I just need to customise various files under $HIVE_HOME/conf and then I 
will have some testing underway.

HTH

Dr Mich Talebzadeh,
Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

 
[https://ci3.googleusercontent.com/mail-sig/AIorK4zholKucR2Q9yMrKbHNn-o1TuS4mYXyi2KO6Xmx6ikHPySa9MLaLZ8t2hrA6AUcxSxDgHIwmKE]
   view my Linkedin 
profile




On Sun, 23 Mar 2025 at 17:13, Ángel Álvarez Pascua 
mailto:angel.alvarez.pas...@gmail.com>> wrote:
Well ... and then? When are we going to tackle this? I could help.

El mié, 12 mar 2025, 15:50, Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>> escribió:
Agreed. Hive upgrade is more time consuming as it involves backing up Hive 
schema on your metastore and then running Hive provided upgrade schema scripts 
against Hive schema that could be problematic,but needs to be done one way or 
another.

HTH

Dr Mich Talebzadeh,
Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

 
[https://ci3.googleusercontent.com/mail-sig/AIorK4zholKucR2Q9yMrKbHNn-o1TuS4mYXyi2KO6Xmx6ikHPySa9MLaLZ8t2hrA6AUcxSxDgHIwmKE]
   view my Linkedin 
profile




On Wed, 12 Mar 2025 at 12:21, Ángel 
mailto:angel.alvarez.pas...@gmail.com>> wrote:
Not an easy task, I guess, but I'm totally for it too.

The issue SPARK-49910 is 
related to this.

El mar, 11 mar 2025 a las 23:06, Mich Talebzadeh 
(mailto:mich.talebza...@gmail.com>>) escribió:
Yes I am all  for it, as I use Hive with Oracle as its metastore extensively.

Case in point, on 6th March A Hive 
user alluded 
to it and I quote

"I just wanted to highlight that Hive 3.x line is EOL. It has various known 
security vulnerabilities, many serious bugs (including wrong results and data 
corruption), and lacks lots of improvements and major features that are 
available in Hive 4. Upgrading is the right path forward."

 In summary, Hive 4.x likely includes performance improvements, new features, 
and bug fixes. Compiling against it would allow Spark to take advantage of 
these. Plus using the latest versions of both Spark and Hive is important for 
maintaining a secure data platform.

HTH

Dr Mich Talebzadeh,
Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

 
[https://ci3.googleusercontent.com/mail-sig/AIorK4zholKucR2Q9yMrKbHNn-o1TuS4mYXyi2KO6Xmx6ikHPySa9MLaLZ8t2hrA6AUcxSxDgHIwmKE]
   view my Linkedin 
profile




On Tue, 11 Mar 2025 at 19:08, Rozov, Vlad  wrote:
Hi All,

As Apache Hive announced EOL for Hive 2.x [1] and 3.x [2], should Spark be 
compiled against Hive 4.x an

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Reynold Xin
My advice to you Vlad is that it would be more fruitful to focus on fixing
the issue than being extremely dogmatic and wasting everybody’s energy
arguing about this.

Of course, you are welcome to form your own opinion.


On Wed, Mar 26, 2025 at 7:38 AM Rozov, Vlad 
wrote:

> Reynold, I am not sure I follow your question. I’ll open PR with the fix
> once JIRA is open.
>
> While I am new to the Spark community, I am not new to the Apache projects
> and open source. Committers are guardians for commits and they keep not
> only master branch, but the entire source code in shape by reviewing PRs
> and committing the code. Once code is committed any changes should still go
> through the Apache process. It is not OK to revert a commit that was merged
> 3 weeks ago and did not block anyone for 3 weeks saying that the snapshot
> is useless.
>
> Thank you,
>
> Vlad
>
>
> On Mar 25, 2025, at 11:06 PM, Reynold Xin 
> wrote:
>
> Sorry Vlad - I disagree. Where is the simple fix? As a new contributor,
> you should not be coming in guns blazing blaming committers who are trying
> to keep the master branch sane and clean.
>
> On Tue, Mar 25, 2025 at 10:53 PM Rozov, Vlad 
> wrote:
>
>> There is a simple fix. This is exactly what I outlined in the e-mail.
>> Prior to reverting commit (on master) it was necessary to see if an easy
>> fix exists. The PR that introduced the error was merged into master 3
>> weeks ago, so I still don’t get why it was reverted overnight. It was
>> also necessary to open JIRA and provide reproducible steps. I assume that
>> steps on the PR is what needs to be fixed, but it will be better to avoid
>> guessing.
>>
>> Thank you,
>>
>> Vlad
>>
>> On Mar 25, 2025, at 10:05 PM, Reynold Xin 
>> wrote:
>>
>> Is there a fix already available or a very simple fix a committer can
>> create quickly? If yes, we can merge the fix. If there isn't, for major
>> functionality breaking change, we should just revert. That's fairly basic
>> software engineering practices.
>>
>>
>> On Tue, Mar 25, 2025 at 9:53 PM Hyukjin Kwon 
>> wrote:
>>
>>> With the change, the main entry points, Spark shalls, don't work and
>>> developers cannot debug and test. The snapshots become uesless.
>>>
>>> The tests passed because you did not fix SBT. It needs a larger change.
>>>
>>> Such change cannot be in the source. I can start a vote if you think
>>> this is an issue.
>>>
>>>
>>> On Wed, Mar 26, 2025 at 12:32 PM Rozov, Vlad 
>>> wrote:
>>>
 This does not make any sense.

 1. There are no broken tests introduced by
 https://github.com/apache/spark/pull/49971
 2. There are no JIRA filed for “the main entry point”
 3. “The main entry point” that does not have any unit test suggests
 that it is not the main entry point.
 4. It is not practical to revert every commit that introduced some
 functional issue or regression. If regression can’t be fixed, it is
 reverted, if it can be reasonably fixed, the fix is applied to the
 regression. Just as an example,
 https://github.com/apache/spark/pull/42823 introduced a regression. It
 was fixed by https://github.com/apache/spark/pull/50348. Nobody asked
 for the revert.
 5. The way how revert was handled is not "Apache way”.

 It may be easier for you to handle it in a single PR, but it is easier
 for me to provide PR on top of existing changes and it will also provide
 JIRA and a commit that documents why there is an issue. Otherwise it will
 be easier to introduce the error back again.

 Thank you,

 Vlad


 On Mar 25, 2025, at 4:58 PM, Hyukjin Kwon  wrote:

 Rozov, this broke the main entry points of release, Spark shells. Even
 in the mast branch, you build a Spark, and cannot use Spark shells.
 Why don't you submit a PR that contains the proper fix? It is easier to
 have one PR that has no issue, e.g., reverting backporting etc.

 On Wed, 26 Mar 2025 at 00:17, Rozov, Vlad 
 wrote:

> Hi All,
>
> I kind of understand why https://github.com/apache/spark/pull/49971 was
> reverted on the branch-4.0 to allow testing of 4.0 release. Why was it 
> also
> reverted on the master branch? I don’t see any JIRA being open for the
> failure. AFAIK, the proper way to handle the issue in Apache project:
>
> - revert  https://github.com/apache/spark/pull/49971 on the
> branch-4.0 (note that PR targeted master branch and SPARK-51229 targets 
> 4.1
> as affected version)
> - open JIRA and provide details of how to reproduce the error on the
> master branch
> - wait for the author of https://github.com/apache/spark/pull/49971 (in
> this case it was me) to reply in a timely manner and only then revert the
> commit
>
> If you agree with the approach, please undo the revert and file JIRA.
> There is an easy fix to the issue.
>
> Thank you,
>
> Vlad
>
>
>

>>
>


Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Rozov, Vlad
Reynold, I am not sure I follow your question. I’ll open PR with the fix once 
JIRA is open.

While I am new to the Spark community, I am not new to the Apache projects and 
open source. Committers are guardians for commits and they keep not only master 
branch, but the entire source code in shape by reviewing PRs and committing the 
code. Once code is committed any changes should still go through the Apache 
process. It is not OK to revert a commit that was merged 3 weeks ago and did 
not block anyone for 3 weeks saying that the snapshot is useless.

Thank you,

Vlad

On Mar 25, 2025, at 11:06 PM, Reynold Xin  wrote:

Sorry Vlad - I disagree. Where is the simple fix? As a new contributor, you 
should not be coming in guns blazing blaming committers who are trying to keep 
the master branch sane and clean.

On Tue, Mar 25, 2025 at 10:53 PM Rozov, Vlad  wrote:
There is a simple fix. This is exactly what I outlined in the e-mail. Prior to 
reverting commit (on master) it was necessary to see if an easy fix exists. The 
PR that introduced the error was merged into master 3 weeks ago, so I still 
don’t get why it was reverted overnight. It was also necessary to open JIRA and 
provide reproducible steps. I assume that steps on the PR is what needs to be 
fixed, but it will be better to avoid guessing.

Thank you,

Vlad

On Mar 25, 2025, at 10:05 PM, Reynold Xin  wrote:

Is there a fix already available or a very simple fix a committer can create 
quickly? If yes, we can merge the fix. If there isn't, for major functionality 
breaking change, we should just revert. That's fairly basic software 
engineering practices.


On Tue, Mar 25, 2025 at 9:53 PM Hyukjin Kwon 
mailto:gurwls...@apache.org>> wrote:
With the change, the main entry points, Spark shalls, don't work and developers 
cannot debug and test. The snapshots become uesless.

The tests passed because you did not fix SBT. It needs a larger change.

Such change cannot be in the source. I can start a vote if you think this is an 
issue.


On Wed, Mar 26, 2025 at 12:32 PM Rozov, Vlad  wrote:
This does not make any sense.

1. There are no broken tests introduced by 
https://github.com/apache/spark/pull/49971
2. There are no JIRA filed for “the main entry point”
3. “The main entry point” that does not have any unit test suggests that it is 
not the main entry point.
4. It is not practical to revert every commit that introduced some functional 
issue or regression. If regression can’t be fixed, it is reverted, if it can be 
reasonably fixed, the fix is applied to the regression. Just as an example, 
https://github.com/apache/spark/pull/42823 introduced a regression. It was 
fixed by https://github.com/apache/spark/pull/50348. Nobody asked for the 
revert.
5. The way how revert was handled is not "Apache way”.

It may be easier for you to handle it in a single PR, but it is easier for me 
to provide PR on top of existing changes and it will also provide JIRA and a 
commit that documents why there is an issue. Otherwise it will be easier to 
introduce the error back again.

Thank you,

Vlad


On Mar 25, 2025, at 4:58 PM, Hyukjin Kwon 
mailto:gurwls...@apache.org>> wrote:

Rozov, this broke the main entry points of release, Spark shells. Even in the 
mast branch, you build a Spark, and cannot use Spark shells.
Why don't you submit a PR that contains the proper fix? It is easier to have 
one PR that has no issue, e.g., reverting backporting etc.

On Wed, 26 Mar 2025 at 00:17, Rozov, Vlad  wrote:
Hi All,

I kind of understand why https://github.com/apache/spark/pull/49971 was 
reverted on the branch-4.0 to allow testing of 4.0 release. Why was it also 
reverted on the master branch? I don’t see any JIRA being open for the failure. 
AFAIK, the proper way to handle the issue in Apache project:

- revert  https://github.com/apache/spark/pull/49971 on the branch-4.0 (note 
that PR targeted master branch and SPARK-51229 targets 4.1 as affected version)
- open JIRA and provide details of how to reproduce the error on the master 
branch
- wait for the author of https://github.com/apache/spark/pull/49971 (in this 
case it was me) to reply in a timely manner and only then revert the commit

If you agree with the approach, please undo the revert and file JIRA. There is 
an easy fix to the issue.

Thank you,

Vlad







Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Nicholas Chammas
On Thu, 27 Mar 2025 at 00:13, Rozov, Vlad  wrote:
> Every graduated from incubating Apache project has guards against what you 
> name “chaotic” and what other name breaking best development practices. Such 
> guards include JIRA, unit tests and PR review. Instead of reverting commit, I 
> would expect you to open JIRA and outline what is broken. If you filed JIRA, 
> please provide the link, if not, please open one.
> 
> It took me 2 hours to fix Spark shells, so should you open JIRA instead of 
> spending time to identify the commit and reverting it, you will save time as 
> well. I’ll post fix once JIRA is open and I validate that my understanding of 
> what is broken is the same as yours.


Just for the record (to Vlad and others new to the Spark project), it is normal 
for committers to revert commits that turn out to break something. Not every 
problem is caught during PR review or CI.

It happened to me a couple of times:
- Reverted because my change broke the release scripts: 
https://github.com/apache/spark/pull/27534#pullrequestreview-384452843
- Reverted because my change broke 3.5 but not 4.0: 
https://github.com/apache/spark/pull/45036#issuecomment-1967045615

And when that happens, it’s usually not a big deal. The contributor usually 
just tries to submit their patch again, incorporating whatever lessons learned 
from the breakage.

I don’t see why this case should be any different from our norm.

Nick



Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Jungtaek Lim
+1 on explanation that it is not happening only to Vlad but always
happening as a normal process.

Vlad, if we are very strict about ASF voting policy, we have to have
three +1s without -1 to merge the code change. I don't think the major
projects in ASF follow it - instead, they (including Spark) allow one +1 to
merge the code changes. This is actually based on "lazy consensus" - if the
code has merged and later day the code change got -1, this is something we
expand the vote to that period of time and try to address -1, otherwise
revert. Without open minded to lazy consensus, we will always have to do 3
days of VOTE with 3 +1s to merge every single code change.

It is maybe arguable that someone can claim that lazy consensus does not
mean the lifetime of a vote is infinite. But I'd like to see us focus on
the issues.

Also, you pull my PR as an example, but the huge difference here is, I am
almost "sole" maintainer of SS, and I'm known to be available to handle the
tasks happening in the OSS community daily manner. So even if we figure out
the issue on my PR later, it is almost guaranteed that I will work on the
issue right away, depending on the severity. This leads community members
(especially committers+) to believe me that I will fix forward or ask for
reverting if it cannot be fixed. It is a matter of an individual's
reliability in the community.

On Thu, Mar 27, 2025 at 2:12 AM Nicholas Chammas 
wrote:

> On Thu, 27 Mar 2025 at 00:13, Rozov, Vlad 
> wrote:
>
>> Every graduated from incubating Apache project has guards against what
>> you name “chaotic” and what other name breaking best development practices.
>> Such guards include JIRA, unit tests and PR review. Instead of reverting
>> commit, I would expect you to open JIRA and outline what is broken. If you
>> filed JIRA, please provide the link, if not, please open one.
>>
>> It took me 2 hours to fix Spark shells, so should you open JIRA instead
>> of spending time to identify the commit and reverting it, you will save
>> time as well. I’ll post fix once JIRA is open and I validate that my
>> understanding of what is broken is the same as yours.
>>
>
> Just for the record (to Vlad and others new to the Spark project), it is
> normal for committers to revert commits that turn out to break something.
> Not every problem is caught during PR review or CI.
>
> It happened to me a couple of times:
> - Reverted because my change broke the release scripts:
> https://github.com/apache/spark/pull/27534#pullrequestreview-384452843
> - Reverted because my change broke 3.5 but not 4.0:
> https://github.com/apache/spark/pull/45036#issuecomment-1967045615
>
> And when that happens, it’s usually not a big deal. The contributor
> usually just tries to submit their patch again, incorporating whatever
> lessons learned from the breakage.
>
> I don’t see why this case should be any different from our norm.
>
> Nick
>
>


Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Wenchen Fan
A slightly off-topic but related question: It feels fragile to test with
SBT while publishing the release with Maven. How did we end up in this
situation? Moreover, since most Spark developers use SBT for their daily
work, it becomes even harder to catch issues with the Maven build.

On Thu, Mar 27, 2025 at 9:15 AM Hyukjin Kwon  wrote:

> Nah, I wasn't clear.
>
> Maven and SBT builds are synced for this special code path, e.g.,
> https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0
> .
> If you build Maven and SBT, the results are almost the same.
>
> Now, the fix you landed in Maven (and indeed it was a Maven specific fix).
> However, it changes the syncing part between Maven and SBT, and it results
> in a different output.
> The CI in the PR passed because we run SBT in PR builders.
>
> My suggestion is to fix both at the same time as usual PRs have been so
> also considering that the fix had an issue.
> It would also need some closer looks who are used to SBT as well to make
> sure the same issue does not happen.
>
>
>
> On Thu, 27 Mar 2025 at 10:06, Rozov, Vlad 
> wrote:
>
>> Sorry, but I still don’t follow. My PR broke Maven and the fix I provided
>> fixes Maven. SBT was never broken except there is inconsistency between SBT
>> and Maven builds. Can the inconsistency be fixed in a follow up PR?
>>
>> Thank you,
>>
>> Vlad
>>
>> On Mar 26, 2025, at 5:57 PM, Hyukjin Kwon  wrote:
>>
>> It is not broken ... because we run SBT in PR builders for ASF resource
>> restrictions and faster build. We use Maven for release so it was found out
>> now.
>>
>> CI did not test your change. The part you are fixing is a special path ..
>>
>> On Thu, Mar 27, 2025 at 9:53 AM Rozov, Vlad 
>> wrote:
>>
>>> If it is not broken, can the sync between maven and SBT
>>> dependencies/shadow be done in a follow up PR?
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>>
>>> On Mar 26, 2025, at 5:44 PM, Hyukjin Kwon  wrote:
>>>
>>> It is not broken. The fix you applied would not be applied in SBT. For
>>> example, the lines you changed (added in
>>> https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0
>>> ):
>>>
>>> diff```
>>> -
>>> -  com.google.common
>>> -
>>> ${spark.shade.packageName}.connect.guava
>>> -  
>>> -com.google.common.**
>>> -  
>>> -
>>> ```
>>>
>>> The companion part of this has to be removed in SBT as well, for example
>>> here
>>> https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0#diff-6f545c33f2fcc975200bf208c900a600a593ce6b170180f81e2f93b3efb6cb3eR674
>>>
>>> Adding simple dependency changes usually are shared between SBT and
>>> Maven so it is not a tricky part. However, some changes like this have to
>>> be made in both places.
>>> Both Maven and SBT builds are usually sync, and changes are mode
>>> together. Let me know if I am underding your change wrongly.
>>>
>>>
>>> On Thu, 27 Mar 2025 at 09:25, Rozov, Vlad 
>>> wrote:
>>>
 Hi Hyukjin,

 Can you clarify what is broken in SBT? As I already mentioned in my
 previous e-mail, I tested both maven and SBT builds and both works. Is
 there a test that I can run that shows what is broken? Please file JIRA, so
 we can communicate on the JIRA and document what is broken.

 Thank you,

 Vlad

 On Mar 26, 2025, at 3:18 PM, Hyukjin Kwon  wrote:

 That only fixes Maven. Both SBT build and Maven build should work in
 the same or similar wat. Let's make sure both work.

 On Thu, Mar 27, 2025 at 3:18 AM Rozov, Vlad 
 wrote:

> Please see https://github.com/vrozov/spark/tree/spark-shell. I tested
> only spark-shell —remote local after building with maven and sbt. It may
> not be a complete fix and there is no PR. I’ll look into SBT build issue
> (assuming that there is still one after the fix) once you file JIRA.
>
> Thank you,
>
> Vlad
>
>
> On Mar 26, 2025, at 8:27 AM, Hyukjin Kwon 
> wrote:
>
> Vlad,
>
> - Please show me if there is a simple fix. If that's the case, yes, I
> will revert this out from the master branch. That works for me.
> - If not, let's make a new PR.
> - If you feel this is an issue, let's start a vote. Let me know.
>
>
> On Thu, 27 Mar 2025 at 00:13, Rozov, Vlad 
> wrote:
>
>> Every graduated from incubating Apache project has guards against
>> what you name “chaotic” and what other name breaking best development
>> practices. Such guards include JIRA, unit tests and PR review. Instead of
>> reverting commit, I would expect you to open JIRA and outline what is
>> broken. If you filed JIRA, please provide the link, if not, please open
>> one.
>>
>> It took me 2 hours to fix Spark shells, so should you open JIRA
>> instead of spending time to identify the commit and reverting it, you 
>> will
>

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Hyukjin Kwon
That only fixes Maven. Both SBT build and Maven build should work in the
same or similar wat. Let's make sure both work.

On Thu, Mar 27, 2025 at 3:18 AM Rozov, Vlad 
wrote:

> Please see https://github.com/vrozov/spark/tree/spark-shell. I tested
> only spark-shell —remote local after building with maven and sbt. It may
> not be a complete fix and there is no PR. I’ll look into SBT build issue
> (assuming that there is still one after the fix) once you file JIRA.
>
> Thank you,
>
> Vlad
>
>
> On Mar 26, 2025, at 8:27 AM, Hyukjin Kwon  wrote:
>
> Vlad,
>
> - Please show me if there is a simple fix. If that's the case, yes, I will
> revert this out from the master branch. That works for me.
> - If not, let's make a new PR.
> - If you feel this is an issue, let's start a vote. Let me know.
>
>
> On Thu, 27 Mar 2025 at 00:13, Rozov, Vlad 
> wrote:
>
>> Every graduated from incubating Apache project has guards against what
>> you name “chaotic” and what other name breaking best development practices.
>> Such guards include JIRA, unit tests and PR review. Instead of reverting
>> commit, I would expect you to open JIRA and outline what is broken. If you
>> filed JIRA, please provide the link, if not, please open one.
>>
>> It took me 2 hours to fix Spark shells, so should you open JIRA instead
>> of spending time to identify the commit and reverting it, you will save
>> time as well. I’ll post fix once JIRA is open and I validate that my
>> understanding of what is broken is the same as yours.
>>
>> Thank you,
>>
>> Vlad
>>
>> On Mar 26, 2025, at 12:01 AM, Hyukjin Kwon  wrote:
>>
>> Rozov, please test the patch, see if there is a relevant test or not, and
>> add a test if not there. If it is difficult to add a test, describe it in
>> the PR description, and how you manually tested.
>> This is what I think you need to do instead of reverting the revert.
>> Imagine that there are many of such PRs living in the master branch.
>> Thankfully this time you are alone but imagine many people do the same
>> things - it will be chaotic. To me, it took 4 hours to identify this single
>> commit.
>>
>> About the change itself, you need to fix SBT together, and it will need
>> the changes such as
>> https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0
>> .
>> Should also test Spark shells, and describe how you tested it as well.
>>
>> This is what I expect:
>> - Please show me if there is a simple fix. If that's the case, yes, I
>> will revert this out from the master branch. That works for me.
>> - If not, let's make a new PR.
>> - If you feel this is an issue, let's start a vote. Let me know.
>>
>>
>>
>> On Wed, 26 Mar 2025 at 15:07, Reynold Xin 
>> wrote:
>>
>>> Sorry Vlad - I disagree. Where is the simple fix? As a new contributor,
>>> you should not be coming in guns blazing blaming committers who are trying
>>> to keep the master branch sane and clean.
>>>
>>> On Tue, Mar 25, 2025 at 10:53 PM Rozov, Vlad 
>>> wrote:
>>>
 There is a simple fix. This is exactly what I outlined in the e-mail.
 Prior to reverting commit (on master) it was necessary to see if an easy
 fix exists. The PR that introduced the error was merged into master 3
 weeks ago, so I still don’t get why it was reverted overnight. It was
 also necessary to open JIRA and provide reproducible steps. I assume that
 steps on the PR is what needs to be fixed, but it will be better to avoid
 guessing.

 Thank you,

 Vlad

 On Mar 25, 2025, at 10:05 PM, Reynold Xin 
 wrote:

 Is there a fix already available or a very simple fix a committer can
 create quickly? If yes, we can merge the fix. If there isn't, for major
 functionality breaking change, we should just revert. That's fairly basic
 software engineering practices.


 On Tue, Mar 25, 2025 at 9:53 PM Hyukjin Kwon 
 wrote:

> With the change, the main entry points, Spark shalls, don't work and
> developers cannot debug and test. The snapshots become uesless.
>
> The tests passed because you did not fix SBT. It needs a larger change.
>
> Such change cannot be in the source. I can start a vote if you think
> this is an issue.
>
>
> On Wed, Mar 26, 2025 at 12:32 PM Rozov, Vlad 
> wrote:
>
>> This does not make any sense.
>>
>> 1. There are no broken tests introduced by
>> https://github.com/apache/spark/pull/49971
>> 2. There are no JIRA filed for “the main entry point”
>> 3. “The main entry point” that does not have any unit test suggests
>> that it is not the main entry point.
>> 4. It is not practical to revert every commit that introduced some
>> functional issue or regression. If regression can’t be fixed, it is
>> reverted, if it can be reasonably fixed, the fix is applied to the
>> regression. Just as an example,
>> https://github.com/apache/spark/pull/42823 introduced a regression.
>> 

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Rozov, Vlad
Every graduated from incubating Apache project has guards against what you name 
“chaotic” and what other name breaking best development practices. Such guards 
include JIRA, unit tests and PR review. Instead of reverting commit, I would 
expect you to open JIRA and outline what is broken. If you filed JIRA, please 
provide the link, if not, please open one.

It took me 2 hours to fix Spark shells, so should you open JIRA instead of 
spending time to identify the commit and reverting it, you will save time as 
well. I’ll post fix once JIRA is open and I validate that my understanding of 
what is broken is the same as yours.

Thank you,

Vlad

On Mar 26, 2025, at 12:01 AM, Hyukjin Kwon  wrote:

Rozov, please test the patch, see if there is a relevant test or not, and add a 
test if not there. If it is difficult to add a test, describe it in the PR 
description, and how you manually tested.
This is what I think you need to do instead of reverting the revert.
Imagine that there are many of such PRs living in the master branch. Thankfully 
this time you are alone but imagine many people do the same things - it will be 
chaotic. To me, it took 4 hours to identify this single commit.

About the change itself, you need to fix SBT together, and it will need the 
changes such as 
https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0.
Should also test Spark shells, and describe how you tested it as well.

This is what I expect:
- Please show me if there is a simple fix. If that's the case, yes, I will 
revert this out from the master branch. That works for me.
- If not, let's make a new PR.
- If you feel this is an issue, let's start a vote. Let me know.



On Wed, 26 Mar 2025 at 15:07, Reynold Xin  wrote:
Sorry Vlad - I disagree. Where is the simple fix? As a new contributor, you 
should not be coming in guns blazing blaming committers who are trying to keep 
the master branch sane and clean.

On Tue, Mar 25, 2025 at 10:53 PM Rozov, Vlad  wrote:
There is a simple fix. This is exactly what I outlined in the e-mail. Prior to 
reverting commit (on master) it was necessary to see if an easy fix exists. The 
PR that introduced the error was merged into master 3 weeks ago, so I still 
don’t get why it was reverted overnight. It was also necessary to open JIRA and 
provide reproducible steps. I assume that steps on the PR is what needs to be 
fixed, but it will be better to avoid guessing.

Thank you,

Vlad

On Mar 25, 2025, at 10:05 PM, Reynold Xin  wrote:

Is there a fix already available or a very simple fix a committer can create 
quickly? If yes, we can merge the fix. If there isn't, for major functionality 
breaking change, we should just revert. That's fairly basic software 
engineering practices.


On Tue, Mar 25, 2025 at 9:53 PM Hyukjin Kwon 
mailto:gurwls...@apache.org>> wrote:
With the change, the main entry points, Spark shalls, don't work and developers 
cannot debug and test. The snapshots become uesless.

The tests passed because you did not fix SBT. It needs a larger change.

Such change cannot be in the source. I can start a vote if you think this is an 
issue.


On Wed, Mar 26, 2025 at 12:32 PM Rozov, Vlad  wrote:
This does not make any sense.

1. There are no broken tests introduced by 
https://github.com/apache/spark/pull/49971
2. There are no JIRA filed for “the main entry point”
3. “The main entry point” that does not have any unit test suggests that it is 
not the main entry point.
4. It is not practical to revert every commit that introduced some functional 
issue or regression. If regression can’t be fixed, it is reverted, if it can be 
reasonably fixed, the fix is applied to the regression. Just as an example, 
https://github.com/apache/spark/pull/42823 introduced a regression. It was 
fixed by https://github.com/apache/spark/pull/50348. Nobody asked for the 
revert.
5. The way how revert was handled is not "Apache way”.

It may be easier for you to handle it in a single PR, but it is easier for me 
to provide PR on top of existing changes and it will also provide JIRA and a 
commit that documents why there is an issue. Otherwise it will be easier to 
introduce the error back again.

Thank you,

Vlad


On Mar 25, 2025, at 4:58 PM, Hyukjin Kwon 
mailto:gurwls...@apache.org>> wrote:

Rozov, this broke the main entry points of release, Spark shells. Even in the 
mast branch, you build a Spark, and cannot use Spark shells.
Why don't you submit a PR that contains the proper fix? It is easier to have 
one PR that has no issue, e.g., reverting backporting etc.

On Wed, 26 Mar 2025 at 00:17, Rozov, Vlad  wrote:
Hi All,

I kind of understand why https://github.com/apache/spark/pull/49971 was 
reverted on the branch-4.0 to allow testing of 4.0 release. Why was it also 
reverted on the master branch? I don’t see any JIRA being open for the failure. 
AFAIK, the proper way to handle the issue in Apache project:

- revert  https://github.com/apache/spark/pull/49971 on the branch-4.0 (

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Hyukjin Kwon
It is not broken. The fix you applied would not be applied in SBT. For
example, the lines you changed (added in
https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0
):

diff```
-
-  com.google.common
-
${spark.shade.packageName}.connect.guava
-  
-com.google.common.**
-  
-
```

The companion part of this has to be removed in SBT as well, for example
here
https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0#diff-6f545c33f2fcc975200bf208c900a600a593ce6b170180f81e2f93b3efb6cb3eR674

Adding simple dependency changes usually are shared between SBT and Maven
so it is not a tricky part. However, some changes like this have to be made
in both places.
Both Maven and SBT builds are usually sync, and changes are mode together.
Let me know if I am underding your change wrongly.


On Thu, 27 Mar 2025 at 09:25, Rozov, Vlad  wrote:

> Hi Hyukjin,
>
> Can you clarify what is broken in SBT? As I already mentioned in my
> previous e-mail, I tested both maven and SBT builds and both works. Is
> there a test that I can run that shows what is broken? Please file JIRA, so
> we can communicate on the JIRA and document what is broken.
>
> Thank you,
>
> Vlad
>
> On Mar 26, 2025, at 3:18 PM, Hyukjin Kwon  wrote:
>
> That only fixes Maven. Both SBT build and Maven build should work in the
> same or similar wat. Let's make sure both work.
>
> On Thu, Mar 27, 2025 at 3:18 AM Rozov, Vlad 
> wrote:
>
>> Please see https://github.com/vrozov/spark/tree/spark-shell. I tested
>> only spark-shell —remote local after building with maven and sbt. It may
>> not be a complete fix and there is no PR. I’ll look into SBT build issue
>> (assuming that there is still one after the fix) once you file JIRA.
>>
>> Thank you,
>>
>> Vlad
>>
>>
>> On Mar 26, 2025, at 8:27 AM, Hyukjin Kwon  wrote:
>>
>> Vlad,
>>
>> - Please show me if there is a simple fix. If that's the case, yes, I
>> will revert this out from the master branch. That works for me.
>> - If not, let's make a new PR.
>> - If you feel this is an issue, let's start a vote. Let me know.
>>
>>
>> On Thu, 27 Mar 2025 at 00:13, Rozov, Vlad 
>> wrote:
>>
>>> Every graduated from incubating Apache project has guards against what
>>> you name “chaotic” and what other name breaking best development practices.
>>> Such guards include JIRA, unit tests and PR review. Instead of reverting
>>> commit, I would expect you to open JIRA and outline what is broken. If you
>>> filed JIRA, please provide the link, if not, please open one.
>>>
>>> It took me 2 hours to fix Spark shells, so should you open JIRA instead
>>> of spending time to identify the commit and reverting it, you will save
>>> time as well. I’ll post fix once JIRA is open and I validate that my
>>> understanding of what is broken is the same as yours.
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>> On Mar 26, 2025, at 12:01 AM, Hyukjin Kwon  wrote:
>>>
>>> Rozov, please test the patch, see if there is a relevant test or not,
>>> and add a test if not there. If it is difficult to add a test, describe it
>>> in the PR description, and how you manually tested.
>>> This is what I think you need to do instead of reverting the revert.
>>> Imagine that there are many of such PRs living in the master branch.
>>> Thankfully this time you are alone but imagine many people do the same
>>> things - it will be chaotic. To me, it took 4 hours to identify this single
>>> commit.
>>>
>>> About the change itself, you need to fix SBT together, and it will need
>>> the changes such as
>>> https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0
>>> .
>>> Should also test Spark shells, and describe how you tested it as well.
>>>
>>> This is what I expect:
>>> - Please show me if there is a simple fix. If that's the case, yes, I
>>> will revert this out from the master branch. That works for me.
>>> - If not, let's make a new PR.
>>> - If you feel this is an issue, let's start a vote. Let me know.
>>>
>>>
>>>
>>> On Wed, 26 Mar 2025 at 15:07, Reynold Xin 
>>> wrote:
>>>
 Sorry Vlad - I disagree. Where is the simple fix? As a new contributor,
 you should not be coming in guns blazing blaming committers who are trying
 to keep the master branch sane and clean.

 On Tue, Mar 25, 2025 at 10:53 PM Rozov, Vlad 
 wrote:

> There is a simple fix. This is exactly what I outlined in the e-mail.
> Prior to reverting commit (on master) it was necessary to see if an easy
> fix exists. The PR that introduced the error was merged into master 3
> weeks ago, so I still don’t get why it was reverted overnight. It was
> also necessary to open JIRA and provide reproducible steps. I assume that
> steps on the PR is what needs to be fixed, but it will be better to avoid
> guessing.
>
> Thank you,
>
> Vlad
>
> On Mar 25, 2025, at 10:05 PM, Reynold Xin 
> wrote:

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Hyukjin Kwon
It is not broken ... because we run SBT in PR builders for ASF resource
restrictions and faster build. We use Maven for release so it was found out
now.

CI did not test your change. The part you are fixing is a special path ..

On Thu, Mar 27, 2025 at 9:53 AM Rozov, Vlad 
wrote:

> If it is not broken, can the sync between maven and SBT
> dependencies/shadow be done in a follow up PR?
>
> Thank you,
>
> Vlad
>
>
> On Mar 26, 2025, at 5:44 PM, Hyukjin Kwon  wrote:
>
> It is not broken. The fix you applied would not be applied in SBT. For
> example, the lines you changed (added in
> https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0
> ):
>
> diff```
> -
> -  com.google.common
> -
> ${spark.shade.packageName}.connect.guava
> -  
> -com.google.common.**
> -  
> -
> ```
>
> The companion part of this has to be removed in SBT as well, for example
> here
> https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0#diff-6f545c33f2fcc975200bf208c900a600a593ce6b170180f81e2f93b3efb6cb3eR674
>
> Adding simple dependency changes usually are shared between SBT and Maven
> so it is not a tricky part. However, some changes like this have to be made
> in both places.
> Both Maven and SBT builds are usually sync, and changes are mode together.
> Let me know if I am underding your change wrongly.
>
>
> On Thu, 27 Mar 2025 at 09:25, Rozov, Vlad 
> wrote:
>
>> Hi Hyukjin,
>>
>> Can you clarify what is broken in SBT? As I already mentioned in my
>> previous e-mail, I tested both maven and SBT builds and both works. Is
>> there a test that I can run that shows what is broken? Please file JIRA, so
>> we can communicate on the JIRA and document what is broken.
>>
>> Thank you,
>>
>> Vlad
>>
>> On Mar 26, 2025, at 3:18 PM, Hyukjin Kwon  wrote:
>>
>> That only fixes Maven. Both SBT build and Maven build should work in the
>> same or similar wat. Let's make sure both work.
>>
>> On Thu, Mar 27, 2025 at 3:18 AM Rozov, Vlad 
>> wrote:
>>
>>> Please see https://github.com/vrozov/spark/tree/spark-shell. I tested
>>> only spark-shell —remote local after building with maven and sbt. It may
>>> not be a complete fix and there is no PR. I’ll look into SBT build issue
>>> (assuming that there is still one after the fix) once you file JIRA.
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>>
>>> On Mar 26, 2025, at 8:27 AM, Hyukjin Kwon  wrote:
>>>
>>> Vlad,
>>>
>>> - Please show me if there is a simple fix. If that's the case, yes, I
>>> will revert this out from the master branch. That works for me.
>>> - If not, let's make a new PR.
>>> - If you feel this is an issue, let's start a vote. Let me know.
>>>
>>>
>>> On Thu, 27 Mar 2025 at 00:13, Rozov, Vlad 
>>> wrote:
>>>
 Every graduated from incubating Apache project has guards against what
 you name “chaotic” and what other name breaking best development practices.
 Such guards include JIRA, unit tests and PR review. Instead of reverting
 commit, I would expect you to open JIRA and outline what is broken. If you
 filed JIRA, please provide the link, if not, please open one.

 It took me 2 hours to fix Spark shells, so should you open JIRA instead
 of spending time to identify the commit and reverting it, you will save
 time as well. I’ll post fix once JIRA is open and I validate that my
 understanding of what is broken is the same as yours.

 Thank you,

 Vlad

 On Mar 26, 2025, at 12:01 AM, Hyukjin Kwon 
 wrote:

 Rozov, please test the patch, see if there is a relevant test or not,
 and add a test if not there. If it is difficult to add a test, describe it
 in the PR description, and how you manually tested.
 This is what I think you need to do instead of reverting the revert.
 Imagine that there are many of such PRs living in the master branch.
 Thankfully this time you are alone but imagine many people do the same
 things - it will be chaotic. To me, it took 4 hours to identify this single
 commit.

 About the change itself, you need to fix SBT together, and it will need
 the changes such as
 https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0
 .
 Should also test Spark shells, and describe how you tested it as well.

 This is what I expect:
 - Please show me if there is a simple fix. If that's the case, yes, I
 will revert this out from the master branch. That works for me.
 - If not, let's make a new PR.
 - If you feel this is an issue, let's start a vote. Let me know.



 On Wed, 26 Mar 2025 at 15:07, Reynold Xin 
 wrote:

> Sorry Vlad - I disagree. Where is the simple fix? As a new
> contributor, you should not be coming in guns blazing blaming committers
> who are trying to keep the master branch sane and clean.
>
> On Tue, Mar 25, 2025 at 10:53 PM 

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Rozov, Vlad
Sorry, but I still don’t follow. My PR broke Maven and the fix I provided fixes 
Maven. SBT was never broken except there is inconsistency between SBT and Maven 
builds. Can the inconsistency be fixed in a follow up PR?

Thank you,

Vlad

On Mar 26, 2025, at 5:57 PM, Hyukjin Kwon  wrote:

It is not broken ... because we run SBT in PR builders for ASF resource 
restrictions and faster build. We use Maven for release so it was found out now.

CI did not test your change. The part you are fixing is a special path ..

On Thu, Mar 27, 2025 at 9:53 AM Rozov, Vlad  wrote:
If it is not broken, can the sync between maven and SBT dependencies/shadow be 
done in a follow up PR?

Thank you,

Vlad


On Mar 26, 2025, at 5:44 PM, Hyukjin Kwon 
mailto:gurwls...@apache.org>> wrote:

It is not broken. The fix you applied would not be applied in SBT. For example, 
the lines you changed (added in 
https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0):

diff```
-
-  com.google.common
-  
${spark.shade.packageName}.connect.guava
-  
-com.google.common.**
-  
-
```

The companion part of this has to be removed in SBT as well, for example here 
https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0#diff-6f545c33f2fcc975200bf208c900a600a593ce6b170180f81e2f93b3efb6cb3eR674

Adding simple dependency changes usually are shared between SBT and Maven so it 
is not a tricky part. However, some changes like this have to be made in both 
places.
Both Maven and SBT builds are usually sync, and changes are mode together. Let 
me know if I am underding your change wrongly.


On Thu, 27 Mar 2025 at 09:25, Rozov, Vlad  wrote:
Hi Hyukjin,

Can you clarify what is broken in SBT? As I already mentioned in my previous 
e-mail, I tested both maven and SBT builds and both works. Is there a test that 
I can run that shows what is broken? Please file JIRA, so we can communicate on 
the JIRA and document what is broken.

Thank you,

Vlad

On Mar 26, 2025, at 3:18 PM, Hyukjin Kwon 
mailto:gurwls...@apache.org>> wrote:

That only fixes Maven. Both SBT build and Maven build should work in the same 
or similar wat. Let's make sure both work.

On Thu, Mar 27, 2025 at 3:18 AM Rozov, Vlad  wrote:
Please see https://github.com/vrozov/spark/tree/spark-shell. I tested only 
spark-shell —remote local after building with maven and sbt. It may not be a 
complete fix and there is no PR. I’ll look into SBT build issue (assuming that 
there is still one after the fix) once you file JIRA.

Thank you,

Vlad


On Mar 26, 2025, at 8:27 AM, Hyukjin Kwon 
mailto:gurwls...@apache.org>> wrote:

Vlad,

- Please show me if there is a simple fix. If that's the case, yes, I will 
revert this out from the master branch. That works for me.
- If not, let's make a new PR.
- If you feel this is an issue, let's start a vote. Let me know.


On Thu, 27 Mar 2025 at 00:13, Rozov, Vlad  wrote:
Every graduated from incubating Apache project has guards against what you name 
“chaotic” and what other name breaking best development practices. Such guards 
include JIRA, unit tests and PR review. Instead of reverting commit, I would 
expect you to open JIRA and outline what is broken. If you filed JIRA, please 
provide the link, if not, please open one.

It took me 2 hours to fix Spark shells, so should you open JIRA instead of 
spending time to identify the commit and reverting it, you will save time as 
well. I’ll post fix once JIRA is open and I validate that my understanding of 
what is broken is the same as yours.

Thank you,

Vlad

On Mar 26, 2025, at 12:01 AM, Hyukjin Kwon 
mailto:gurwls...@apache.org>> wrote:

Rozov, please test the patch, see if there is a relevant test or not, and add a 
test if not there. If it is difficult to add a test, describe it in the PR 
description, and how you manually tested.
This is what I think you need to do instead of reverting the revert.
Imagine that there are many of such PRs living in the master branch. Thankfully 
this time you are alone but imagine many people do the same things - it will be 
chaotic. To me, it took 4 hours to identify this single commit.

About the change itself, you need to fix SBT together, and it will need the 
changes such as 
https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0.
Should also test Spark shells, and describe how you tested it as well.

This is what I expect:
- Please show me if there is a simple fix. If that's the case, yes, I will 
revert this out from the master branch. That works for me.
- If not, let's make a new PR.
- If you feel this is an issue, let's start a vote. Let me know.



On Wed, 26 Mar 2025 at 15:07, Reynold Xin  wrote:
Sorry Vlad - I disagree. Where is the simple fix? As a new contributor, you 
should not be coming in guns blazing blaming committers who are trying to keep 
the master branch sane and clean.

On Tue, Mar 25, 2025 at 

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Rozov, Vlad
If it is not broken, can the sync between maven and SBT dependencies/shadow be 
done in a follow up PR?

Thank you,

Vlad

On Mar 26, 2025, at 5:44 PM, Hyukjin Kwon  wrote:

It is not broken. The fix you applied would not be applied in SBT. For example, 
the lines you changed (added in 
https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0):

diff```
-
-  com.google.common
-  
${spark.shade.packageName}.connect.guava
-  
-com.google.common.**
-  
-
```

The companion part of this has to be removed in SBT as well, for example here 
https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0#diff-6f545c33f2fcc975200bf208c900a600a593ce6b170180f81e2f93b3efb6cb3eR674

Adding simple dependency changes usually are shared between SBT and Maven so it 
is not a tricky part. However, some changes like this have to be made in both 
places.
Both Maven and SBT builds are usually sync, and changes are mode together. Let 
me know if I am underding your change wrongly.


On Thu, 27 Mar 2025 at 09:25, Rozov, Vlad  wrote:
Hi Hyukjin,

Can you clarify what is broken in SBT? As I already mentioned in my previous 
e-mail, I tested both maven and SBT builds and both works. Is there a test that 
I can run that shows what is broken? Please file JIRA, so we can communicate on 
the JIRA and document what is broken.

Thank you,

Vlad

On Mar 26, 2025, at 3:18 PM, Hyukjin Kwon 
mailto:gurwls...@apache.org>> wrote:

That only fixes Maven. Both SBT build and Maven build should work in the same 
or similar wat. Let's make sure both work.

On Thu, Mar 27, 2025 at 3:18 AM Rozov, Vlad  wrote:
Please see https://github.com/vrozov/spark/tree/spark-shell. I tested only 
spark-shell —remote local after building with maven and sbt. It may not be a 
complete fix and there is no PR. I’ll look into SBT build issue (assuming that 
there is still one after the fix) once you file JIRA.

Thank you,

Vlad


On Mar 26, 2025, at 8:27 AM, Hyukjin Kwon 
mailto:gurwls...@apache.org>> wrote:

Vlad,

- Please show me if there is a simple fix. If that's the case, yes, I will 
revert this out from the master branch. That works for me.
- If not, let's make a new PR.
- If you feel this is an issue, let's start a vote. Let me know.


On Thu, 27 Mar 2025 at 00:13, Rozov, Vlad  wrote:
Every graduated from incubating Apache project has guards against what you name 
“chaotic” and what other name breaking best development practices. Such guards 
include JIRA, unit tests and PR review. Instead of reverting commit, I would 
expect you to open JIRA and outline what is broken. If you filed JIRA, please 
provide the link, if not, please open one.

It took me 2 hours to fix Spark shells, so should you open JIRA instead of 
spending time to identify the commit and reverting it, you will save time as 
well. I’ll post fix once JIRA is open and I validate that my understanding of 
what is broken is the same as yours.

Thank you,

Vlad

On Mar 26, 2025, at 12:01 AM, Hyukjin Kwon 
mailto:gurwls...@apache.org>> wrote:

Rozov, please test the patch, see if there is a relevant test or not, and add a 
test if not there. If it is difficult to add a test, describe it in the PR 
description, and how you manually tested.
This is what I think you need to do instead of reverting the revert.
Imagine that there are many of such PRs living in the master branch. Thankfully 
this time you are alone but imagine many people do the same things - it will be 
chaotic. To me, it took 4 hours to identify this single commit.

About the change itself, you need to fix SBT together, and it will need the 
changes such as 
https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0.
Should also test Spark shells, and describe how you tested it as well.

This is what I expect:
- Please show me if there is a simple fix. If that's the case, yes, I will 
revert this out from the master branch. That works for me.
- If not, let's make a new PR.
- If you feel this is an issue, let's start a vote. Let me know.



On Wed, 26 Mar 2025 at 15:07, Reynold Xin  wrote:
Sorry Vlad - I disagree. Where is the simple fix? As a new contributor, you 
should not be coming in guns blazing blaming committers who are trying to keep 
the master branch sane and clean.

On Tue, Mar 25, 2025 at 10:53 PM Rozov, Vlad  wrote:
There is a simple fix. This is exactly what I outlined in the e-mail. Prior to 
reverting commit (on master) it was necessary to see if an easy fix exists. The 
PR that introduced the error was merged into master 3 weeks ago, so I still 
don’t get why it was reverted overnight. It was also necessary to open JIRA and 
provide reproducible steps. I assume that steps on the PR is what needs to be 
fixed, but it will be better to avoid guessing.

Thank you,

Vlad

On Mar 25, 2025, at 10:05 PM, Reynold Xin  wrote:

Is there a fix already available or a very simple fix a co

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Hyukjin Kwon
Nah, I wasn't clear.

Maven and SBT builds are synced for this special code path, e.g.,
https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0
.
If you build Maven and SBT, the results are almost the same.

Now, the fix you landed in Maven (and indeed it was a Maven specific fix).
However, it changes the syncing part between Maven and SBT, and it results
in a different output.
The CI in the PR passed because we run SBT in PR builders.

My suggestion is to fix both at the same time as usual PRs have been so
also considering that the fix had an issue.
It would also need some closer looks who are used to SBT as well to make
sure the same issue does not happen.



On Thu, 27 Mar 2025 at 10:06, Rozov, Vlad  wrote:

> Sorry, but I still don’t follow. My PR broke Maven and the fix I provided
> fixes Maven. SBT was never broken except there is inconsistency between SBT
> and Maven builds. Can the inconsistency be fixed in a follow up PR?
>
> Thank you,
>
> Vlad
>
> On Mar 26, 2025, at 5:57 PM, Hyukjin Kwon  wrote:
>
> It is not broken ... because we run SBT in PR builders for ASF resource
> restrictions and faster build. We use Maven for release so it was found out
> now.
>
> CI did not test your change. The part you are fixing is a special path ..
>
> On Thu, Mar 27, 2025 at 9:53 AM Rozov, Vlad 
> wrote:
>
>> If it is not broken, can the sync between maven and SBT
>> dependencies/shadow be done in a follow up PR?
>>
>> Thank you,
>>
>> Vlad
>>
>>
>> On Mar 26, 2025, at 5:44 PM, Hyukjin Kwon  wrote:
>>
>> It is not broken. The fix you applied would not be applied in SBT. For
>> example, the lines you changed (added in
>> https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0
>> ):
>>
>> diff```
>> -
>> -  com.google.common
>> -
>> ${spark.shade.packageName}.connect.guava
>> -  
>> -com.google.common.**
>> -  
>> -
>> ```
>>
>> The companion part of this has to be removed in SBT as well, for example
>> here
>> https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0#diff-6f545c33f2fcc975200bf208c900a600a593ce6b170180f81e2f93b3efb6cb3eR674
>>
>> Adding simple dependency changes usually are shared between SBT and Maven
>> so it is not a tricky part. However, some changes like this have to be made
>> in both places.
>> Both Maven and SBT builds are usually sync, and changes are mode
>> together. Let me know if I am underding your change wrongly.
>>
>>
>> On Thu, 27 Mar 2025 at 09:25, Rozov, Vlad 
>> wrote:
>>
>>> Hi Hyukjin,
>>>
>>> Can you clarify what is broken in SBT? As I already mentioned in my
>>> previous e-mail, I tested both maven and SBT builds and both works. Is
>>> there a test that I can run that shows what is broken? Please file JIRA, so
>>> we can communicate on the JIRA and document what is broken.
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>> On Mar 26, 2025, at 3:18 PM, Hyukjin Kwon  wrote:
>>>
>>> That only fixes Maven. Both SBT build and Maven build should work in the
>>> same or similar wat. Let's make sure both work.
>>>
>>> On Thu, Mar 27, 2025 at 3:18 AM Rozov, Vlad 
>>> wrote:
>>>
 Please see https://github.com/vrozov/spark/tree/spark-shell. I tested
 only spark-shell —remote local after building with maven and sbt. It may
 not be a complete fix and there is no PR. I’ll look into SBT build issue
 (assuming that there is still one after the fix) once you file JIRA.

 Thank you,

 Vlad


 On Mar 26, 2025, at 8:27 AM, Hyukjin Kwon  wrote:

 Vlad,

 - Please show me if there is a simple fix. If that's the case, yes, I
 will revert this out from the master branch. That works for me.
 - If not, let's make a new PR.
 - If you feel this is an issue, let's start a vote. Let me know.


 On Thu, 27 Mar 2025 at 00:13, Rozov, Vlad 
 wrote:

> Every graduated from incubating Apache project has guards against what
> you name “chaotic” and what other name breaking best development 
> practices.
> Such guards include JIRA, unit tests and PR review. Instead of reverting
> commit, I would expect you to open JIRA and outline what is broken. If you
> filed JIRA, please provide the link, if not, please open one.
>
> It took me 2 hours to fix Spark shells, so should you open JIRA
> instead of spending time to identify the commit and reverting it, you will
> save time as well. I’ll post fix once JIRA is open and I validate that my
> understanding of what is broken is the same as yours.
>
> Thank you,
>
> Vlad
>
> On Mar 26, 2025, at 12:01 AM, Hyukjin Kwon 
> wrote:
>
> Rozov, please test the patch, see if there is a relevant test or not,
> and add a test if not there. If it is difficult to add a test, describe it
> in the PR description, and how you manually tested.
> This is what I think you nee

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Hyukjin Kwon
Here's a bit of history and context:

The project was initially built using SBT (
https://github.com/apache/spark/commit/df29d0ea4c8b7137fdd1844219c7d489e3b0d9c9
).
Later, Maven support was added (
https://github.com/apache/spark/commit/811a32257b1b59b042a2871eede6ee39d9e8a137
)
to provide an alternative to SBT and improve publishing options, such as
Debian packages and classifiers (
https://lists.apache.org/thread/dco0s7jp0rgfns3cptfkwohbc7xl5lj4).

Our CI was originally based on SBT, which is faster than Maven - at least
for the Spark project. Many build scripts and tools also rely on SBT, such
as run-tests.py
.

We initially ran CI on Jenkins but later migrated to GitHub Actions. At one
point, there was some discussion about switching to Maven builds, but they
were too slow.
To make matters worse, we faced requests to reduce GitHub Actions resource
usage (https://issues.apache.org/jira/browse/SPARK-35119).
In response, we leveraged contributors' forks to offload resource
consumption.
Later, we were asked to reduce it even further (
https://issues.apache.org/jira/browse/SPARK-48094), which we successfully
managed.



On Thu, Mar 27, 2025 at 11:21 AM Wenchen Fan  wrote:

> A slightly off-topic but related question: It feels fragile to test with
> SBT while publishing the release with Maven. How did we end up in this
> situation? Moreover, since most Spark developers use SBT for their daily
> work, it becomes even harder to catch issues with the Maven build.
>
> On Thu, Mar 27, 2025 at 9:15 AM Hyukjin Kwon  wrote:
>
>> Nah, I wasn't clear.
>>
>> Maven and SBT builds are synced for this special code path, e.g.,
>> https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0
>> .
>> If you build Maven and SBT, the results are almost the same.
>>
>> Now, the fix you landed in Maven (and indeed it was a Maven specific fix).
>> However, it changes the syncing part between Maven and SBT, and it
>> results in a different output.
>> The CI in the PR passed because we run SBT in PR builders.
>>
>> My suggestion is to fix both at the same time as usual PRs have been so
>> also considering that the fix had an issue.
>> It would also need some closer looks who are used to SBT as well to make
>> sure the same issue does not happen.
>>
>>
>>
>> On Thu, 27 Mar 2025 at 10:06, Rozov, Vlad 
>> wrote:
>>
>>> Sorry, but I still don’t follow. My PR broke Maven and the fix I
>>> provided fixes Maven. SBT was never broken except there is inconsistency
>>> between SBT and Maven builds. Can the inconsistency be fixed in a follow up
>>> PR?
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>> On Mar 26, 2025, at 5:57 PM, Hyukjin Kwon  wrote:
>>>
>>> It is not broken ... because we run SBT in PR builders for ASF resource
>>> restrictions and faster build. We use Maven for release so it was found out
>>> now.
>>>
>>> CI did not test your change. The part you are fixing is a special path ..
>>>
>>> On Thu, Mar 27, 2025 at 9:53 AM Rozov, Vlad 
>>> wrote:
>>>
 If it is not broken, can the sync between maven and SBT
 dependencies/shadow be done in a follow up PR?

 Thank you,

 Vlad


 On Mar 26, 2025, at 5:44 PM, Hyukjin Kwon  wrote:

 It is not broken. The fix you applied would not be applied in SBT. For
 example, the lines you changed (added in
 https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0
 ):

 diff```
 -
 -  com.google.common
 -
 ${spark.shade.packageName}.connect.guava
 -  
 -com.google.common.**
 -  
 -
 ```

 The companion part of this has to be removed in SBT as well, for
 example here
 https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0#diff-6f545c33f2fcc975200bf208c900a600a593ce6b170180f81e2f93b3efb6cb3eR674

 Adding simple dependency changes usually are shared between SBT and
 Maven so it is not a tricky part. However, some changes like this have to
 be made in both places.
 Both Maven and SBT builds are usually sync, and changes are mode
 together. Let me know if I am underding your change wrongly.


 On Thu, 27 Mar 2025 at 09:25, Rozov, Vlad 
 wrote:

> Hi Hyukjin,
>
> Can you clarify what is broken in SBT? As I already mentioned in my
> previous e-mail, I tested both maven and SBT builds and both works. Is
> there a test that I can run that shows what is broken? Please file JIRA, 
> so
> we can communicate on the JIRA and document what is broken.
>
> Thank you,
>
> Vlad
>
> On Mar 26, 2025, at 3:18 PM, Hyukjin Kwon 
> wrote:
>
> That only fixes Maven. Both SBT build and Maven build should work in
> the same or similar wat. Let's make sure both work.
>
> On Thu, Mar 27, 2025 at 3:18 AM Roz

Re: [DISCUSS] Upgrade Hive compile time dependency to 4.0

2025-03-26 Thread Mich Talebzadeh
Because of dependencies we need to ensure that the underlying artifacts
(Hive 4.0.1) is also stable enough. We should aim to establish that first
and look for release timelines and where it fits

cheers

Dr Mich Talebzadeh,
Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

   view my Linkedin profile






On Wed, 26 Mar 2025 at 06:02, Rozov, Vlad  wrote:

> I started working on it. See https://github.com/apache/spark/pull/50213.
> Review and comments on the PR will help a lot.
>
> +1 for 4.1. It won’t be ready for 4.0 and will require extensive testing.
>
> I have few more local changes that fixes some tests in sql/hive and should
> publish another revision soon.
>
> Thank you,
>
> Vlad
>
>
> On Mar 25, 2025, at 10:12 PM, Wenchen Fan  wrote:
>
> I agree, 4.0 is already in the RC stage and I think it's too late to do
> such a big version bump for the Hive dependency.
>
> We definitely need to do this upgrade and thanks for working on it!
>
> On Mon, Mar 24, 2025 at 1:31 PM Ángel Álvarez Pascua <
> angel.alvarez.pas...@gmail.com> wrote:
>
>> That's great news! If you need anything from me, just ask...
>>
>> We should also check and update other non-Hive third-party libraries with
>> high/critical vulnerabilities, as someone mentioned in another email thread.
>>
>> Since this is a major change, I think we should leave it for Spark 4.1.
>> What do you think?
>>
>> El lun, 24 mar 2025 a las 0:59, Mich Talebzadeh (<
>> mich.talebza...@gmail.com>) escribió:
>>
>>> For now, I am testing apache-hive-4.0.1-bin which is the latest release
>>> version  from
>>>
>>> https://dlcdn.apache.org/hive/hive-4.0.1/
>>>
>>> apache-hive-4.0.1-bin.tar
>>>
>>> My metastore is Oracle and upgrade scripts are provided..
>>>
>>> My previous version is Hive 3.1.1 and the metastore upgrade went OK
>>> without any major headache.
>>>
>>> Now I just need to customise various files under $HIVE_HOME/conf and
>>> then I will have some testing underway.
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh,
>>> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>
>>>
>>>
>>> On Sun, 23 Mar 2025 at 17:13, Ángel Álvarez Pascua <
>>> angel.alvarez.pas...@gmail.com> wrote:
>>>
 Well ... and then? When are we going to tackle this? I could help.

 El mié, 12 mar 2025, 15:50, Mich Talebzadeh 
 escribió:

> Agreed. Hive upgrade is more time consuming as it involves backing up
> Hive schema on your metastore and then running Hive provided upgrade 
> schema
> scripts against Hive schema that could be problematic,but needs to be done
> one way or another.
>
> HTH
>
> Dr Mich Talebzadeh,
> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>
>view my Linkedin profile
> 
>
>
>
>
>
> On Wed, 12 Mar 2025 at 12:21, Ángel 
> wrote:
>
>> Not an easy task, I guess, but I'm totally for it too.
>>
>> The issue SPARK-49910
>>  is related to
>> this.
>>
>> El mar, 11 mar 2025 a las 23:06, Mich Talebzadeh (<
>> mich.talebza...@gmail.com>) escribió:
>>
>>> Yes I am all  for it, as I use Hive with Oracle as its metastore
>>> extensively.
>>>
>>> Case in point, on 6th March A Hive user
>>> 
>>> alluded to it and I quote
>>>
>>> "I just wanted to highlight that Hive 3.x line is EOL. It has
>>> various known security vulnerabilities, many serious bugs (including 
>>> wrong
>>> results and data corruption), and lacks lots of improvements and major
>>> features that are available in Hive 4. Upgrading is the right path 
>>> forward."
>>>
>>>  In summary, Hive 4.x likely includes performance improvements, new
>>> features, and bug fixes. Compiling against it would allow Spark to take
>>> advantage of these. Plus using the latest versions of both Spark and 
>>> Hive
>>> is important for maintaining a secure data platform.
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh,
>>> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>
>>>
>>>
>>> On Tue, 11 Mar 2025 at 19:08, Rozov, Vlad 
>>> wrote:
>>>
 Hi All,

 As Apache Hive announced EOL for Hive 2.x [1] and 3.x [2], should
 Spark be compiled against Hive 4.x and use it as default?

 Thank you,

 Vlad

 [1]
 https://lists.apache.org/thread/4ctrzfw60jkhc0hq2xoh1jpqxgt2z

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Hyukjin Kwon
Rozov, please test the patch, see if there is a relevant test or not, and
add a test if not there. If it is difficult to add a test, describe it in
the PR description, and how you manually tested.
This is what I think you need to do instead of reverting the revert.
Imagine that there are many of such PRs living in the master branch.
Thankfully this time you are alone but imagine many people do the same
things - it will be chaotic. To me, it took 4 hours to identify this single
commit.

About the change itself, you need to fix SBT together, and it will need the
changes such as
https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0
.
Should also test Spark shells, and describe how you tested it as well.

This is what I expect:
- Please show me if there is a simple fix. If that's the case, yes, I will
revert this out from the master branch. That works for me.
- If not, let's make a new PR.
- If you feel this is an issue, let's start a vote. Let me know.



On Wed, 26 Mar 2025 at 15:07, Reynold Xin 
wrote:

> Sorry Vlad - I disagree. Where is the simple fix? As a new contributor,
> you should not be coming in guns blazing blaming committers who are trying
> to keep the master branch sane and clean.
>
> On Tue, Mar 25, 2025 at 10:53 PM Rozov, Vlad 
> wrote:
>
>> There is a simple fix. This is exactly what I outlined in the e-mail.
>> Prior to reverting commit (on master) it was necessary to see if an easy
>> fix exists. The PR that introduced the error was merged into master 3
>> weeks ago, so I still don’t get why it was reverted overnight. It was
>> also necessary to open JIRA and provide reproducible steps. I assume that
>> steps on the PR is what needs to be fixed, but it will be better to avoid
>> guessing.
>>
>> Thank you,
>>
>> Vlad
>>
>> On Mar 25, 2025, at 10:05 PM, Reynold Xin 
>> wrote:
>>
>> Is there a fix already available or a very simple fix a committer can
>> create quickly? If yes, we can merge the fix. If there isn't, for major
>> functionality breaking change, we should just revert. That's fairly basic
>> software engineering practices.
>>
>>
>> On Tue, Mar 25, 2025 at 9:53 PM Hyukjin Kwon 
>> wrote:
>>
>>> With the change, the main entry points, Spark shalls, don't work and
>>> developers cannot debug and test. The snapshots become uesless.
>>>
>>> The tests passed because you did not fix SBT. It needs a larger change.
>>>
>>> Such change cannot be in the source. I can start a vote if you think
>>> this is an issue.
>>>
>>>
>>> On Wed, Mar 26, 2025 at 12:32 PM Rozov, Vlad 
>>> wrote:
>>>
 This does not make any sense.

 1. There are no broken tests introduced by
 https://github.com/apache/spark/pull/49971
 2. There are no JIRA filed for “the main entry point”
 3. “The main entry point” that does not have any unit test suggests
 that it is not the main entry point.
 4. It is not practical to revert every commit that introduced some
 functional issue or regression. If regression can’t be fixed, it is
 reverted, if it can be reasonably fixed, the fix is applied to the
 regression. Just as an example,
 https://github.com/apache/spark/pull/42823 introduced a regression. It
 was fixed by https://github.com/apache/spark/pull/50348. Nobody asked
 for the revert.
 5. The way how revert was handled is not "Apache way”.

 It may be easier for you to handle it in a single PR, but it is easier
 for me to provide PR on top of existing changes and it will also provide
 JIRA and a commit that documents why there is an issue. Otherwise it will
 be easier to introduce the error back again.

 Thank you,

 Vlad


 On Mar 25, 2025, at 4:58 PM, Hyukjin Kwon  wrote:

 Rozov, this broke the main entry points of release, Spark shells. Even
 in the mast branch, you build a Spark, and cannot use Spark shells.
 Why don't you submit a PR that contains the proper fix? It is easier to
 have one PR that has no issue, e.g., reverting backporting etc.

 On Wed, 26 Mar 2025 at 00:17, Rozov, Vlad 
 wrote:

> Hi All,
>
> I kind of understand why https://github.com/apache/spark/pull/49971 was
> reverted on the branch-4.0 to allow testing of 4.0 release. Why was it 
> also
> reverted on the master branch? I don’t see any JIRA being open for the
> failure. AFAIK, the proper way to handle the issue in Apache project:
>
> - revert  https://github.com/apache/spark/pull/49971 on the
> branch-4.0 (note that PR targeted master branch and SPARK-51229 targets 
> 4.1
> as affected version)
> - open JIRA and provide details of how to reproduce the error on the
> master branch
> - wait for the author of https://github.com/apache/spark/pull/49971 (in
> this case it was me) to reply in a timely manner and only then revert the
> commit
>
> If you agree with the approach, please undo the revert and file J