[DISCUSS] Involve any hack / workaround to not include vendor name in migration logic

2025-03-16 Thread Jungtaek Lim
Hi dev,

I'm really tired of the discussion which does not move forward because the
argument is not backed by strict ASF policy. We debate based on
the interpretation of ASF policy by individuals, which I think makes zero
sense.

I really thought about this a lot how to resolve this, and now I'm open to
have a hack on the migration logic to eliminate the main concern on the
debate, which I can think like following:

1) Checks the config string via pattern, like the string contains
".optimizer.pruneFiltersCanPruneStreamingSubplan" at the end and the string
is longer than "spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan"
(or strictly 12 chars longer).

2) Encode the incorrect config name with any hashing algorithm, and use
this to compare with. We do not document about the origin string, but
probably leave the offending ticket to at least figure out when we forgot
about  what was the origin string by ourselves when we ever debug this.

3) etcetc (I'm open for better ideas, except just removing the migration
logic.)

In overall, indirect comparison.

This definitely complicates the logic, but the logic is really just 4 lines
to begin with, and maybe it will be just 10 lines or so, so it's still
something manageable.

It might be slightly tricky on testing, but this is a lot easier than
debating in there. I'm now regretting not allowing myself to introduce a
hack in earlier days - we should have saved 3 weeks.

If we are OK to allow a bit of indirect checking on the logic to just
remove out the debate, I'm happy to do that. It's obvious that we can just
leave this migration logic much longer than what I was proposing, because
we eliminate the main concern.

I'm open to hear about support and objections.

Thanks,
Jungtaek Lim (HeartSaVioR)


Re: [DISCUSS] Involve any hack / workaround to not include vendor name in migration logic

2025-03-16 Thread Jungtaek Lim
Apologize for quick fix, This won't go to VOTE "unless" there are "valid"
objections. The message was sent accidentally before I double checked.

On Sun, Mar 16, 2025 at 5:00 PM Jungtaek Lim 
wrote:

> Just to clarify: This won't go to VOTE as long as there are "valid"
> objections. Let's not waste more time on our end.
>
> On Sun, Mar 16, 2025 at 4:52 PM Jungtaek Lim 
> wrote:
>
>> Hi dev,
>>
>> I'm really tired of the discussion which does not move forward because
>> the argument is not backed by strict ASF policy. We debate based on
>> the interpretation of ASF policy by individuals, which I think makes zero
>> sense.
>>
>> I really thought about this a lot how to resolve this, and now I'm open
>> to have a hack on the migration logic to eliminate the main concern on the
>> debate, which I can think like following:
>>
>> 1) Checks the config string via pattern, like the string contains
>> ".optimizer.pruneFiltersCanPruneStreamingSubplan" at the end and the string
>> is longer than "spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan"
>> (or strictly 12 chars longer).
>>
>> 2) Encode the incorrect config name with any hashing algorithm, and use
>> this to compare with. We do not document about the origin string, but
>> probably leave the offending ticket to at least figure out when we forgot
>> about  what was the origin string by ourselves when we ever debug this.
>>
>> 3) etcetc (I'm open for better ideas, except just removing the migration
>> logic.)
>>
>> In overall, indirect comparison.
>>
>> This definitely complicates the logic, but the logic is really just 4
>> lines to begin with, and maybe it will be just 10 lines or so, so it's
>> still something manageable.
>>
>> It might be slightly tricky on testing, but this is a lot easier than
>> debating in there. I'm now regretting not allowing myself to introduce a
>> hack in earlier days - we should have saved 3 weeks.
>>
>> If we are OK to allow a bit of indirect checking on the logic to just
>> remove out the debate, I'm happy to do that. It's obvious that we can just
>> leave this migration logic much longer than what I was proposing, because
>> we eliminate the main concern.
>>
>> I'm open to hear about support and objections.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>


Re: [DISCUSS] Involve any hack / workaround to not include vendor name in migration logic

2025-03-16 Thread Jungtaek Lim
Just to clarify: This won't go to VOTE as long as there are "valid"
objections. Let's not waste more time on our end.

On Sun, Mar 16, 2025 at 4:52 PM Jungtaek Lim 
wrote:

> Hi dev,
>
> I'm really tired of the discussion which does not move forward because the
> argument is not backed by strict ASF policy. We debate based on
> the interpretation of ASF policy by individuals, which I think makes zero
> sense.
>
> I really thought about this a lot how to resolve this, and now I'm open to
> have a hack on the migration logic to eliminate the main concern on the
> debate, which I can think like following:
>
> 1) Checks the config string via pattern, like the string contains
> ".optimizer.pruneFiltersCanPruneStreamingSubplan" at the end and the string
> is longer than "spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan"
> (or strictly 12 chars longer).
>
> 2) Encode the incorrect config name with any hashing algorithm, and use
> this to compare with. We do not document about the origin string, but
> probably leave the offending ticket to at least figure out when we forgot
> about  what was the origin string by ourselves when we ever debug this.
>
> 3) etcetc (I'm open for better ideas, except just removing the migration
> logic.)
>
> In overall, indirect comparison.
>
> This definitely complicates the logic, but the logic is really just 4
> lines to begin with, and maybe it will be just 10 lines or so, so it's
> still something manageable.
>
> It might be slightly tricky on testing, but this is a lot easier than
> debating in there. I'm now regretting not allowing myself to introduce a
> hack in earlier days - we should have saved 3 weeks.
>
> If we are OK to allow a bit of indirect checking on the logic to just
> remove out the debate, I'm happy to do that. It's obvious that we can just
> leave this migration logic much longer than what I was proposing, because
> we eliminate the main concern.
>
> I'm open to hear about support and objections.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>


Re: [DISCUSS] Involve any hack / workaround to not include vendor name in migration logic

2025-03-16 Thread Mark Hamstra
Doing something like pattern matching on
u0064\u0061\u0074\u0061\u0062\u0072\u0069\u0063\u006b\u0073 instead of
“databricks” might also be an option if including “databricks” in the code
is believed to be so offensive.


On Sun, Mar 16, 2025 at 12:52 AM Jungtaek Lim 
wrote:

> Hi dev,
>
> I'm really tired of the discussion which does not move forward because the
> argument is not backed by strict ASF policy. We debate based on
> the interpretation of ASF policy by individuals, which I think makes zero
> sense.
>
> I really thought about this a lot how to resolve this, and now I'm open to
> have a hack on the migration logic to eliminate the main concern on the
> debate, which I can think like following:
>
> 1) Checks the config string via pattern, like the string contains
> ".optimizer.pruneFiltersCanPruneStreamingSubplan" at the end and the string
> is longer than "spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan"
> (or strictly 12 chars longer).
>
> 2) Encode the incorrect config name with any hashing algorithm, and use
> this to compare with. We do not document about the origin string, but
> probably leave the offending ticket to at least figure out when we forgot
> about  what was the origin string by ourselves when we ever debug this.
>
> 3) etcetc (I'm open for better ideas, except just removing the migration
> logic.)
>
> In overall, indirect comparison.
>
> This definitely complicates the logic, but the logic is really just 4
> lines to begin with, and maybe it will be just 10 lines or so, so it's
> still something manageable.
>
> It might be slightly tricky on testing, but this is a lot easier than
> debating in there. I'm now regretting not allowing myself to introduce a
> hack in earlier days - we should have saved 3 weeks.
>
> If we are OK to allow a bit of indirect checking on the logic to just
> remove out the debate, I'm happy to do that. It's obvious that we can just
> leave this migration logic much longer than what I was proposing, because
> we eliminate the main concern.
>
> I'm open to hear about support and objections.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>


Re: [DISCUSS] Involve any hack / workaround to not include vendor name in migration logic

2025-03-16 Thread Jungtaek Lim
Yeah... maybe 1 is simpler if there is no side effect, and probably the
latter pattern we have is long enough to figure out aliases without full
text matching.

On Sun, Mar 16, 2025 at 5:15 PM Mark Hamstra  wrote:

> Doing something like pattern matching on
> u0064\u0061\u0074\u0061\u0062\u0072\u0069\u0063\u006b\u0073 instead of
> “databricks” might also be an option if including “databricks” in the code
> is believed to be so offensive.
>
>
> On Sun, Mar 16, 2025 at 12:52 AM Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
>
>> Hi dev,
>>
>> I'm really tired of the discussion which does not move forward because
>> the argument is not backed by strict ASF policy. We debate based on
>> the interpretation of ASF policy by individuals, which I think makes zero
>> sense.
>>
>> I really thought about this a lot how to resolve this, and now I'm open
>> to have a hack on the migration logic to eliminate the main concern on the
>> debate, which I can think like following:
>>
>> 1) Checks the config string via pattern, like the string contains
>> ".optimizer.pruneFiltersCanPruneStreamingSubplan" at the end and the string
>> is longer than "spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan"
>> (or strictly 12 chars longer).
>>
>> 2) Encode the incorrect config name with any hashing algorithm, and use
>> this to compare with. We do not document about the origin string, but
>> probably leave the offending ticket to at least figure out when we forgot
>> about  what was the origin string by ourselves when we ever debug this.
>>
>> 3) etcetc (I'm open for better ideas, except just removing the migration
>> logic.)
>>
>> In overall, indirect comparison.
>>
>> This definitely complicates the logic, but the logic is really just 4
>> lines to begin with, and maybe it will be just 10 lines or so, so it's
>> still something manageable.
>>
>> It might be slightly tricky on testing, but this is a lot easier than
>> debating in there. I'm now regretting not allowing myself to introduce a
>> hack in earlier days - we should have saved 3 weeks.
>>
>> If we are OK to allow a bit of indirect checking on the logic to just
>> remove out the debate, I'm happy to do that. It's obvious that we can just
>> leave this migration logic much longer than what I was proposing, because
>> we eliminate the main concern.
>>
>> I'm open to hear about support and objections.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>


[ANNOUNCE] Apache Sedona 1.7.1 released

2025-03-16 Thread Jia Yu
Dear all,

We are happy to report that we have released Apache Sedona 1.7.1.
Thank you again for your help.

Apache Sedona is a cluster computing system for processing large-scale
spatial data on top of Apache Spark, Flink and Snowflake.

Vote thread (Permalink from https://lists.apache.org/list.html):
https://lists.apache.org/thread/zy77psfpyhgys31jf3x1y89hmf9o522h

Vote result thread (Permalink from https://lists.apache.org/list.html):
https://lists.apache.org/thread/ch990fhjcl9jvjbs0m9zfb785cyj96m3

Website:
http://sedona.apache.org/

Release notes:
https://github.com/apache/sedona/blob/sedona-1.7.1/docs/setup/release-notes.md

Download links:
https://github.com/apache/sedona/releases/tag/sedona-1.7.1

Additional resources:
Mailing list: d...@sedona.apache.org
X: https://x.com/ApacheSedona

Regards,
Apache Sedona Team

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] New Spark Connect Client repository for Swift language

2025-03-16 Thread Martin Grund
So I was just playing with the Swift client to build a Mac app and
everything worked nicely! However, similar to the Go repository, my
suggestion would be to handle issues directly in Github and not in Jira
because the vast majority of initial issues will be simply compatibility
issues. And creating them in Jira instead of GH where I'm already, is going
to be a big overhead.

What do you think?

In the Go client, we follow the approach that cross-language issues go to
Jira, whereas client-specific ones go directly into Github.

On Tue, Mar 11, 2025 at 3:28 AM Jules Damji  wrote:

> + 1 (non-binding)
>
> Generally speaking, it’s a good idea to separate repositories for all
> Spark Connect clients under Spark.
> - better organization
> - better visibility
> - easier for contribution
> - better for growth & extension of Spark Connect ecosystem
>
> Cheers
> Jules
> —
> Sent from my iPhone
> Pardon the dumb thumb typos :)
>
>
> —
> Sent from my iPhone
> Pardon the dumb thumb typos :)
> > On Mar 10, 2025, at 4:37 PM, Dongjoon Hyun  wrote:
> >
> > Thank you everyone for your support.
> >
> > New Apache Spark repository is created at the proposed location with ASF
> license and open for `Spark Connect Client for Swift language`
> contributions.
> >
> > https://github.com/apache/spark-connect-swift
> >
> > FYI, this repository will be managed in the same way with
> `spark-kubernetes-operator` repository.
> >
> > Thank you again.
> >
> > Dongjoon.
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [DISCUSS] New Spark Connect Client repository for Swift language

2025-03-16 Thread Dongjoon Hyun
Thank you for the suggestion, Martin.

I prefer to follow `spark` and `spark-kubernetes-operator` repository way as 
the traditional Apache Spark way.

In addition, apparently, it seems that I'm going to maintain 
`spark-connect-swift` repository as the main contributor and main reviewer in 
most cases. So, please allow me to use my preferred way as the main worker on 
that repository.

Could you tell me how the Apache Spark JIRA is blocking you, Martin? You are 
the Apache Spark committer which is supposed to use Apache Spark JIRA for Spark 
contribution. I'm interested what kind of difficulty blocks the Apache Spark 
committer.

Thanks,
Dongjoon.

On 2025/03/16 21:08:08 Martin Grund wrote:
> So I was just playing with the Swift client to build a Mac app and
> everything worked nicely! However, similar to the Go repository, my
> suggestion would be to handle issues directly in Github and not in Jira
> because the vast majority of initial issues will be simply compatibility
> issues. And creating them in Jira instead of GH where I'm already, is going
> to be a big overhead.
> 
> What do you think?
> 
> In the Go client, we follow the approach that cross-language issues go to
> Jira, whereas client-specific ones go directly into Github.
> 
> On Tue, Mar 11, 2025 at 3:28 AM Jules Damji  wrote:
> 
> > + 1 (non-binding)
> >
> > Generally speaking, it’s a good idea to separate repositories for all
> > Spark Connect clients under Spark.
> > - better organization
> > - better visibility
> > - easier for contribution
> > - better for growth & extension of Spark Connect ecosystem
> >
> > Cheers
> > Jules
> > —
> > Sent from my iPhone
> > Pardon the dumb thumb typos :)
> >
> >
> > —
> > Sent from my iPhone
> > Pardon the dumb thumb typos :)
> > > On Mar 10, 2025, at 4:37 PM, Dongjoon Hyun  wrote:
> > >
> > > Thank you everyone for your support.
> > >
> > > New Apache Spark repository is created at the proposed location with ASF
> > license and open for `Spark Connect Client for Swift language`
> > contributions.
> > >
> > > https://github.com/apache/spark-connect-swift
> > >
> > > FYI, this repository will be managed in the same way with
> > `spark-kubernetes-operator` repository.
> > >
> > > Thank you again.
> > >
> > > Dongjoon.
> > >
> > > -
> > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
> >
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Jungtaek Lim
OK, let's be super honest.

Again, I think you agree that *"both" proposals are "technically" correct
(or one side can't have a strong theoretical evidence to counter the other
side)*. So this naturally has a fate to have more supporters to get to the
end. It's very easy for me to VETO to his proposal (although I don't have a
binding vote, I think I have people who agree with me) if we think we want
to definitely expand the interpretation of VETO criteria in the Apache
Voting Process.

You said it is up to the PMC member exercising the veto to use their
judgement, but definitely, it must not be used to force the community to
follow his proposal. The major argument here is, he can just VETO to any
proposal to retain the codebase as the way he prefers to, which I don't
believe is a correct usage of VETO.

If we just revert the change of removal of config, this is "really" neutral
neither my proposal nor his proposal. Do we really want to do so?

On Mon, Mar 17, 2025 at 10:55 AM Holden Karau 
wrote:

> First let me start with my key hope:
>
> We find a way to compromise and have the veto withdrawn rather than
> overridden.
>
> From what I understand of the change in question:
>
> So my understanding, and I may be over simplifying here but there are (at
> least) three technical paths forward (migration guide, legacy config with
> vendor string in it, non-vendor specific string legacy config), a PMC
> member vetoed one of them (named vendor legacy config) because he thought a
> different approach was better (migration guide) as they were worried that
> carrying that legacy config forward would encourage bad coding standards
> (eg we would add more vendor named config flags). To me that seems like a
> valid concern.
>
> My reasoning:
>
> Thinking back at other VETOs that I’ve been involved with in this project
> (DSV2, graceful decom, etc) this seems to meet the same bar. Hell we’ve had
> plenty of vetos that didn’t offer an alternative.
>
> My personal understanding of where the bar for “
> a technical justification showing why the change is bad” concern is
> pretty much “any not factually incorrect reasoning”, the text doesn’t have
> any particular “bar” for the level of “badness” and I think it’s up to the
> PMC member exercising the veto to use their judgement.
>
> In closing, I feel like the path we’re going down (overriding a veto) is
> not healthy for the project.
>
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> 
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her
>
>
> On Sun, Mar 16, 2025 at 6:28 PM Jungtaek Lim 
> wrote:
>
>> Holden, I believe you should already know "both" approaches are
>> "technically" correct. It's not about which one you have a preference for,
>> no, this VOTE is not intended to extend the debate.
>>
>> Again, what you are encouraged to do here is, not exposing your
>> preference of two approaches, but exposing your "technically valid" concern
>> of my approach, backed by Dongjoon's veto (most likely you want to quote
>> Dongjoon's post). This is very simple and I'm not sure you are doing
>> exactly what the VOTE requires.
>>
>> On Mon, Mar 17, 2025 at 6:32 AM Holden Karau 
>> wrote:
>>
>>> -1 (binding) — to me it doesn’t matter that the cost is low if the
>>> objection is technical then I think we need to respect the veto. There is a
>>> fundamental disagreement as to what the correct technical way to address
>>> this problem is (removal + documentation vs legacy config) and a PMC member
>>> has vetoed  the legacy config option.
>>>
>>> I think I disagree with Mark on the assertion that the veto needs to
>>> have “substantial technical concern,” but rather a valid concern. I think
>>> in addition to the veto they’ve also gone above and beyond providing
>>> alternative ways to accomplish this.
>>>
>>> On a personal level:
>>>
>>> I am optimistic we can unblock the release but I think it’s important to
>>> err on the side of respecting the veto here in the interest of perceived
>>> fairness *especially* because of vendor aspects.
>>>
>>> To be clear I’ve worked at most of these companies (and many of the
>>> people) and I’m not ascribing malice to anyone in this, I think mistakes
>>> happen (god knows I’ve had a fair share). I think we’re all doing our best
>>> here and would ask that we show everyone understanding regardless of the
>>> outcome.
>>>
>>> Sending hugs and good vibes to y’all.
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>> 
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkara

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Wenchen Fan
I've created the revert PR for branch-4.0:
https://github.com/apache/spark/pull/50291 . We can merge PRs with lazy
consensus but it's clear that this breaking change PR has failed to achieve
consensus.

I hope we now have a clear foundation for discussing solutions. As it
stands, the misnamed configuration will be released in 4.0.0. I like
Jungtaek’s proposal to deprecate it, but the decision is up to the
community.

On Mon, Mar 17, 2025 at 10:19 AM Jungtaek Lim 
wrote:

> OK, let's be super honest.
>
> Again, I think you agree that *"both" proposals are "technically" correct
> (or one side can't have a strong theoretical evidence to counter the other
> side)*. So this naturally has a fate to have more supporters to get to
> the end. It's very easy for me to VETO to his proposal (although I don't
> have a binding vote, I think I have people who agree with me) if we think
> we want to definitely expand the interpretation of VETO criteria in the
> Apache Voting Process.
>
> You said it is up to the PMC member exercising the veto to use their
> judgement, but definitely, it must not be used to force the community to
> follow his proposal. The major argument here is, he can just VETO to any
> proposal to retain the codebase as the way he prefers to, which I don't
> believe is a correct usage of VETO.
>
> If we just revert the change of removal of config, this is "really"
> neutral neither my proposal nor his proposal. Do we really want to do so?
>
> On Mon, Mar 17, 2025 at 10:55 AM Holden Karau 
> wrote:
>
>> First let me start with my key hope:
>>
>> We find a way to compromise and have the veto withdrawn rather than
>> overridden.
>>
>> From what I understand of the change in question:
>>
>> So my understanding, and I may be over simplifying here but there are (at
>> least) three technical paths forward (migration guide, legacy config with
>> vendor string in it, non-vendor specific string legacy config), a PMC
>> member vetoed one of them (named vendor legacy config) because he thought a
>> different approach was better (migration guide) as they were worried that
>> carrying that legacy config forward would encourage bad coding standards
>> (eg we would add more vendor named config flags). To me that seems like a
>> valid concern.
>>
>> My reasoning:
>>
>> Thinking back at other VETOs that I’ve been involved with in this project
>> (DSV2, graceful decom, etc) this seems to meet the same bar. Hell we’ve had
>> plenty of vetos that didn’t offer an alternative.
>>
>> My personal understanding of where the bar for “
>> a technical justification showing why the change is bad” concern is
>> pretty much “any not factually incorrect reasoning”, the text doesn’t have
>> any particular “bar” for the level of “badness” and I think it’s up to the
>> PMC member exercising the veto to use their judgement.
>>
>> In closing, I feel like the path we’re going down (overriding a veto) is
>> not healthy for the project.
>>
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> 
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>>
>> On Sun, Mar 16, 2025 at 6:28 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Holden, I believe you should already know "both" approaches are
>>> "technically" correct. It's not about which one you have a preference for,
>>> no, this VOTE is not intended to extend the debate.
>>>
>>> Again, what you are encouraged to do here is, not exposing your
>>> preference of two approaches, but exposing your "technically valid" concern
>>> of my approach, backed by Dongjoon's veto (most likely you want to quote
>>> Dongjoon's post). This is very simple and I'm not sure you are doing
>>> exactly what the VOTE requires.
>>>
>>> On Mon, Mar 17, 2025 at 6:32 AM Holden Karau 
>>> wrote:
>>>
 -1 (binding) — to me it doesn’t matter that the cost is low if the
 objection is technical then I think we need to respect the veto. There is a
 fundamental disagreement as to what the correct technical way to address
 this problem is (removal + documentation vs legacy config) and a PMC member
 has vetoed  the legacy config option.

 I think I disagree with Mark on the assertion that the veto needs to
 have “substantial technical concern,” but rather a valid concern. I think
 in addition to the veto they’ve also gone above and beyond providing
 alternative ways to accomplish this.

 On a personal level:

 I am optimistic we can unblock the release but I think it’s important
 to err on the side of respecting the veto here in the interest of perceived
 fairness *especially* because of vendor aspects.

 To be clear I’ve worked at most of these companies (and many of the
 people) and I’m

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Wenchen Fan
I agree with Holden that withdrawing a veto is always better than
overriding it: it's healthier for the community. Dongjoon, would you be
willing to reconsider your veto given the current as-is state of the 4.0.0
release (the breaking change will be reverted)?

On Mon, Mar 17, 2025 at 10:36 AM Wenchen Fan  wrote:

> I've created the revert PR for branch-4.0:
> https://github.com/apache/spark/pull/50291 . We can merge PRs with lazy
> consensus but it's clear that this breaking change PR has failed to achieve
> consensus.
>
> I hope we now have a clear foundation for discussing solutions. As it
> stands, the misnamed configuration will be released in 4.0.0. I like
> Jungtaek’s proposal to deprecate it, but the decision is up to the
> community.
>
> On Mon, Mar 17, 2025 at 10:19 AM Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
>
>> OK, let's be super honest.
>>
>> Again, I think you agree that *"both" proposals are "technically"
>> correct (or one side can't have a strong theoretical evidence to counter
>> the other side)*. So this naturally has a fate to have more supporters
>> to get to the end. It's very easy for me to VETO to his proposal (although
>> I don't have a binding vote, I think I have people who agree with me) if we
>> think we want to definitely expand the interpretation of VETO criteria in
>> the Apache Voting Process.
>>
>> You said it is up to the PMC member exercising the veto to use their
>> judgement, but definitely, it must not be used to force the community to
>> follow his proposal. The major argument here is, he can just VETO to any
>> proposal to retain the codebase as the way he prefers to, which I don't
>> believe is a correct usage of VETO.
>>
>> If we just revert the change of removal of config, this is "really"
>> neutral neither my proposal nor his proposal. Do we really want to do so?
>>
>> On Mon, Mar 17, 2025 at 10:55 AM Holden Karau 
>> wrote:
>>
>>> First let me start with my key hope:
>>>
>>> We find a way to compromise and have the veto withdrawn rather than
>>> overridden.
>>>
>>> From what I understand of the change in question:
>>>
>>> So my understanding, and I may be over simplifying here but there are
>>> (at least) three technical paths forward (migration guide, legacy config
>>> with vendor string in it, non-vendor specific string legacy config), a PMC
>>> member vetoed one of them (named vendor legacy config) because he thought a
>>> different approach was better (migration guide) as they were worried that
>>> carrying that legacy config forward would encourage bad coding standards
>>> (eg we would add more vendor named config flags). To me that seems like a
>>> valid concern.
>>>
>>> My reasoning:
>>>
>>> Thinking back at other VETOs that I’ve been involved with in this
>>> project (DSV2, graceful decom, etc) this seems to meet the same bar. Hell
>>> we’ve had plenty of vetos that didn’t offer an alternative.
>>>
>>> My personal understanding of where the bar for “
>>> a technical justification showing why the change is bad” concern is
>>> pretty much “any not factually incorrect reasoning”, the text doesn’t have
>>> any particular “bar” for the level of “badness” and I think it’s up to the
>>> PMC member exercising the veto to use their judgement.
>>>
>>> In closing, I feel like the path we’re going down (overriding a veto) is
>>> not healthy for the project.
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>> 
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>> Pronouns: she/her
>>>
>>>
>>> On Sun, Mar 16, 2025 at 6:28 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
 Holden, I believe you should already know "both" approaches are
 "technically" correct. It's not about which one you have a preference for,
 no, this VOTE is not intended to extend the debate.

 Again, what you are encouraged to do here is, not exposing your
 preference of two approaches, but exposing your "technically valid" concern
 of my approach, backed by Dongjoon's veto (most likely you want to quote
 Dongjoon's post). This is very simple and I'm not sure you are doing
 exactly what the VOTE requires.

 On Mon, Mar 17, 2025 at 6:32 AM Holden Karau 
 wrote:

> -1 (binding) — to me it doesn’t matter that the cost is low if the
> objection is technical then I think we need to respect the veto. There is 
> a
> fundamental disagreement as to what the correct technical way to address
> this problem is (removal + documentation vs legacy config) and a PMC 
> member
> has vetoed  the legacy config option.
>
> I think I disagree with Mark on the assertion that the veto needs to
> have “substantial technical concern,” but rather a valid concern

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Jungtaek Lim
Holden and Dongjoon,

Let me make this vote super simple. I never got the answer from Dongjoon
about this question. This is super important because if he's casting veto
"to block", it is a strong indication that he was intended to play with me,
which I am seriously considering escalating the problem (If this is true,
it's no longer just a justification of vote, but someone's
behavioral issue).

https://github.com/apache/spark/pull/49983

I might be missing another timeline, but, if you follow the conversation
here, there are some facts:

1. Dongjoon "knew" we were never decided about the direction of Spark 4.0.0
behavior. (link
)
2. Dongjoon "agreed" my proposal is technically correct. (link
)
3. Dongjoon "agreed" to hear from the community about discussing my
proposal. (link
)

Worth clarifying, 3 happened after we discussed the removal of "config".
Dongjoon continuously mixed up the fact - while we were in agreement of
removal of config, removal of migration logic was definitely left to open
question. Let me give the VOTE Dongjoon drove and made it pass.

https://lists.apache.org/thread/6nn76olr65b8zfgzdcbtr9f6o98451o5

This was totally about 3.5.5. If Dongjoon thinks this simply applies to
Spark 4.0.0+, it's not, no?

Also, let's revisit the discussion we were discussing about removal of
config.

https://lists.apache.org/thread/qxqzt7wbtdyxp17d7s1rxhnrswdccsgb

Dongjoon clearly stated that we only make a consensus about Spark 3.5.5,
and we can continue discussion about the proper behavior in Spark 4.0.0.
That is the rationale I drove my own discussion. I can be corrected, but
there is NO discussion/vote w.r.t this topic AFAIK.

Dongjoon, now it's your time to prove there is a valid reason to change
your mind during this time frame. If the above are all true, you are
already indicating that you can never cast a veto. (Or show me the evidence
of how you change your mind for which reason.) If any of the above are
something you intended to not tell the truth, I am really not sure your
comment will be truthful I can follow. Especially, if you did not tell the
truth from 3, e.g. you let me go and discuss while you were intended to
block me in any phase, this is a strong indication that you intend to play
with me and the community (or even ASF) has to know that.

Do not evade the root question.

On Mon, Mar 17, 2025 at 6:32 AM Holden Karau  wrote:

> -1 (binding) — to me it doesn’t matter that the cost is low if the
> objection is technical then I think we need to respect the veto. There is a
> fundamental disagreement as to what the correct technical way to address
> this problem is (removal + documentation vs legacy config) and a PMC member
> has vetoed  the legacy config option.
>
> I think I disagree with Mark on the assertion that the veto needs to have
> “substantial technical concern,” but rather a valid concern. I think in
> addition to the veto they’ve also gone above and beyond providing
> alternative ways to accomplish this.
>
> On a personal level:
>
> I am optimistic we can unblock the release but I think it’s important to
> err on the side of respecting the veto here in the interest of perceived
> fairness *especially* because of vendor aspects.
>
> To be clear I’ve worked at most of these companies (and many of the
> people) and I’m not ascribing malice to anyone in this, I think mistakes
> happen (god knows I’ve had a fair share). I think we’re all doing our best
> here and would ask that we show everyone understanding regardless of the
> outcome.
>
> Sending hugs and good vibes to y’all.
>
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> 
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her
>
>
> On Sat, Mar 15, 2025 at 5:07 PM Holden Karau 
> wrote:
>
>> Given it’s the weekend maybe let’s give folks at least one full work day.
>>
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> 
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>>
>> On Sat, Mar 15, 2025 at 4:44 PM Mark Hamstra 
>> wrote:
>>
>>> Quick administrative note: I don't see any reason why this vote should
>>> take a long time, so I expect to close the process and tally the votes
>>> in not much more than 48 hours.
>>>
>>> On Sat, Mar 15, 2025 at 4:35 PM Mark Hamstra 
>>> wrote:
>>> >
>>> > There has been enough discussion on th

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Holden Karau
I'm delighted to see folks talking about a compromise. However, instead of
just asking Dongjoon to withdraw the VETO perhaps folks can suggest
alternatives that that might meet some of both parties goals?

On Sun, Mar 16, 2025 at 7:41 PM Wenchen Fan  wrote:

> I agree with Holden that withdrawing a veto is always better than
> overriding it: it's healthier for the community. Dongjoon, would you be
> willing to reconsider your veto given the current as-is state of the 4.0.0
> release (the breaking change will be reverted)?
>
> On Mon, Mar 17, 2025 at 10:36 AM Wenchen Fan  wrote:
>
>> I've created the revert PR for branch-4.0:
>> https://github.com/apache/spark/pull/50291 . We can merge PRs with lazy
>> consensus but it's clear that this breaking change PR has failed to achieve
>> consensus.
>>
>> I hope we now have a clear foundation for discussing solutions. As it
>> stands, the misnamed configuration will be released in 4.0.0. I like
>> Jungtaek’s proposal to deprecate it, but the decision is up to the
>> community.
>>
>> On Mon, Mar 17, 2025 at 10:19 AM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> OK, let's be super honest.
>>>
>>> Again, I think you agree that *"both" proposals are "technically"
>>> correct (or one side can't have a strong theoretical evidence to counter
>>> the other side)*. So this naturally has a fate to have more supporters
>>> to get to the end. It's very easy for me to VETO to his proposal (although
>>> I don't have a binding vote, I think I have people who agree with me) if we
>>> think we want to definitely expand the interpretation of VETO criteria in
>>> the Apache Voting Process.
>>>
>>> You said it is up to the PMC member exercising the veto to use their
>>> judgement, but definitely, it must not be used to force the community to
>>> follow his proposal. The major argument here is, he can just VETO to any
>>> proposal to retain the codebase as the way he prefers to, which I don't
>>> believe is a correct usage of VETO.
>>>
>>> If we just revert the change of removal of config, this is "really"
>>> neutral neither my proposal nor his proposal. Do we really want to do so?
>>>
>>> On Mon, Mar 17, 2025 at 10:55 AM Holden Karau 
>>> wrote:
>>>
 First let me start with my key hope:

 We find a way to compromise and have the veto withdrawn rather than
 overridden.

 From what I understand of the change in question:

 So my understanding, and I may be over simplifying here but there are
 (at least) three technical paths forward (migration guide, legacy config
 with vendor string in it, non-vendor specific string legacy config), a PMC
 member vetoed one of them (named vendor legacy config) because he thought a
 different approach was better (migration guide) as they were worried that
 carrying that legacy config forward would encourage bad coding standards
 (eg we would add more vendor named config flags). To me that seems like a
 valid concern.

 My reasoning:

 Thinking back at other VETOs that I’ve been involved with in this
 project (DSV2, graceful decom, etc) this seems to meet the same bar. Hell
 we’ve had plenty of vetos that didn’t offer an alternative.

 My personal understanding of where the bar for “
 a technical justification showing why the change is bad” concern is
 pretty much “any not factually incorrect reasoning”, the text doesn’t have
 any particular “bar” for the level of “badness” and I think it’s up to the
 PMC member exercising the veto to use their judgement.

 In closing, I feel like the path we’re going down (overriding a veto)
 is not healthy for the project.

 Twitter: https://twitter.com/holdenkarau
 Fight Health Insurance: https://www.fighthealthinsurance.com/
 
 Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9  
 YouTube Live Streams: https://www.youtube.com/user/holdenkarau
 Pronouns: she/her


 On Sun, Mar 16, 2025 at 6:28 PM Jungtaek Lim <
 kabhwan.opensou...@gmail.com> wrote:

> Holden, I believe you should already know "both" approaches are
> "technically" correct. It's not about which one you have a preference for,
> no, this VOTE is not intended to extend the debate.
>
> Again, what you are encouraged to do here is, not exposing your
> preference of two approaches, but exposing your "technically valid" 
> concern
> of my approach, backed by Dongjoon's veto (most likely you want to quote
> Dongjoon's post). This is very simple and I'm not sure you are doing
> exactly what the VOTE requires.
>
> On Mon, Mar 17, 2025 at 6:32 AM Holden Karau 
> wrote:
>
>> -1 (binding) — to me it doesn’t matter that the cost is low if the
>> objection is technical then I think we need to respect the veto. There 
>>>

Re: [Discuss] SPIP: Support NanoSecond Timestamps

2025-03-16 Thread Dongjoon Hyun
+1 for supporting NanoSecond Timestamps.

Thank you, Qi.

Dongjoon.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Unsubscribe

2025-03-16 Thread Zac Wang



-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Jungtaek Lim
I was trying hard to stay away from this VOTE, but I should have reminded
everyone about "what" we are going to VOTE.

Dongjoon casted a VETO against code change VOTE. That VETO is described in
ASF Voting Process page:

https://www.apache.org/foundation/voting.html#Veto

A -1 vote by a qualified voter stops a code-modification proposal in its
> tracks. This constitutes a veto, and it cannot be overruled nor overridden
> by anyone. Vetoes stand until and unless the individual withdraws their
> veto.
>
> To prevent vetoes from being used capriciously, the voter must provide
> with the veto a technical justification showing why the change is bad
> (opens a security exposure, negatively affects performance, etc. ). A veto
> without a justification is invalid and has no weight.


For sure, the technical justification must be "objective", otherwise it
means if I'm a PMC member I can veto everything if I don't like it.

The main argument here about "vendor name in the codebase" is NOT something
we have ever seen disallowing this in ASF policy. If there is evidence, it
will immediately kill the two VOTEs as it is enough objective argument. But
no one was able to bring this up. Please remember, the fact "vendor name in
the codebase is bad for any reason", is proven to be NOT an "objective"
claim, otherwise how the DISCUSS and VOTE were almost passing with support
from PMC members?

I really suggest everyone who casts a vote in this VOTE thread, to be based
on "objective" rationale. For example, we tend to consider < 10 lines of
code to be very trivial to maintain, so the argument of maintenance burden
does not apply here. Like this.


On Sun, Mar 16, 2025 at 8:38 AM Mark Hamstra  wrote:

> There has been enough discussion on this topic already, so I think
> that an immediate vote on the validity of Dongjoon's technical
> justification for his veto of the "Retain migration logic ... in Spark
> 4.0.x" proposal is in order. That technical justification has been
> called into question, and the guidance at
> https://www.apache.org/foundation/glossary.html#Veto leaves it to the
> PMC to determine whether the technical justification is  valid: "In
> case of doubt, deciding whether a technical justification is valid is
> up to the PMC." As such, only PMC votes will decide the outcome of
> this vote. This is neither a vote on a code change itself not a vote
> on whether a package is ready for release, so it a procedural vote on
> whether the technical justification is valid. As such, the vote will
> be decided by a simple majority where +1 votes hold that the technical
> justification is not valid and -1 votes hold that the technical
> justification is valid.
>
> I would request that at least PMC members post more than just a naked
> vote, but instead endeavor to give some reason why they have assessed
> the technical justification as they have. I'll start:
>
> Despite all of the discussion related to Dongjoon's -1 vote, I must
> confess to still not being entirely clear on what is his technical
> justification for that veto. I see claims that including an admonition
> in the Spark 4.0.x release notes that a prior upgrade to 3.5.5 is
> required to maintain the integrity of already existing data streams,
> and I see assertions about the maintenance burden that including the
> migration logic would impose on future Spark versions, but I don't
> think that I see any other technical objections. I do not believe that
> the claimed technical justification is valid.
>
> In requiring that a veto of a code change be accompanied by a
> technical justification for the veto, the Apache Voting Process states
> that: "To prevent vetoes from being used capriciously, the voter must
> provide with the veto a technical justification showing why the change
> is bad (opens a security exposure, negatively affects performance,
> etc. ). A veto without a justification is invalid and has no weight."
> This strongly implies that there must be something objectively wrong
> with the proposed code change in that it causes significant harm in
> the way of opening a security exposure, negatively affecting
> performance, or presumably other significant user harms or perhaps
> even developer burdens.
>
> The proposed addition of the migration logic to Spark 4.0.x does not
> cause any harm to Spark's users. For many users, those not using
> streaming data, the change will have no effect. For streaming users
> the change will be beneficial, not harmful.
>
> Neither do I find the claim of excessive, ongoing developer burden to
> be persuasive. The changes are tiny and easily maintained -- in fact,
> it wouldn't surprise me if no further changes to this migration logic
> would be needed for a very long time.
>
> Some of what we are left with is just an expression of preference for
> a technical alternative to the migration logic -- i.e. including in
> the release notes an admonition to first upgrade to 3.5.5. But the
> Apache Voting Process does not say that 

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Jungtaek Lim
Holden, I believe you should already know "both" approaches are
"technically" correct. It's not about which one you have a preference for,
no, this VOTE is not intended to extend the debate.

Again, what you are encouraged to do here is, not exposing your preference
of two approaches, but exposing your "technically valid" concern of my
approach, backed by Dongjoon's veto (most likely you want to quote
Dongjoon's post). This is very simple and I'm not sure you are doing
exactly what the VOTE requires.

On Mon, Mar 17, 2025 at 6:32 AM Holden Karau  wrote:

> -1 (binding) — to me it doesn’t matter that the cost is low if the
> objection is technical then I think we need to respect the veto. There is a
> fundamental disagreement as to what the correct technical way to address
> this problem is (removal + documentation vs legacy config) and a PMC member
> has vetoed  the legacy config option.
>
> I think I disagree with Mark on the assertion that the veto needs to have
> “substantial technical concern,” but rather a valid concern. I think in
> addition to the veto they’ve also gone above and beyond providing
> alternative ways to accomplish this.
>
> On a personal level:
>
> I am optimistic we can unblock the release but I think it’s important to
> err on the side of respecting the veto here in the interest of perceived
> fairness *especially* because of vendor aspects.
>
> To be clear I’ve worked at most of these companies (and many of the
> people) and I’m not ascribing malice to anyone in this, I think mistakes
> happen (god knows I’ve had a fair share). I think we’re all doing our best
> here and would ask that we show everyone understanding regardless of the
> outcome.
>
> Sending hugs and good vibes to y’all.
>
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> 
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her
>
>
> On Sat, Mar 15, 2025 at 5:07 PM Holden Karau 
> wrote:
>
>> Given it’s the weekend maybe let’s give folks at least one full work day.
>>
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> 
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>>
>> On Sat, Mar 15, 2025 at 4:44 PM Mark Hamstra 
>> wrote:
>>
>>> Quick administrative note: I don't see any reason why this vote should
>>> take a long time, so I expect to close the process and tally the votes
>>> in not much more than 48 hours.
>>>
>>> On Sat, Mar 15, 2025 at 4:35 PM Mark Hamstra 
>>> wrote:
>>> >
>>> > There has been enough discussion on this topic already, so I think
>>> > that an immediate vote on the validity of Dongjoon's technical
>>> > justification for his veto of the "Retain migration logic ... in Spark
>>> > 4.0.x" proposal is in order. That technical justification has been
>>> > called into question, and the guidance at
>>> > https://www.apache.org/foundation/glossary.html#Veto leaves it to the
>>> > PMC to determine whether the technical justification is  valid: "In
>>> > case of doubt, deciding whether a technical justification is valid is
>>> > up to the PMC." As such, only PMC votes will decide the outcome of
>>> > this vote. This is neither a vote on a code change itself not a vote
>>> > on whether a package is ready for release, so it a procedural vote on
>>> > whether the technical justification is valid. As such, the vote will
>>> > be decided by a simple majority where +1 votes hold that the technical
>>> > justification is not valid and -1 votes hold that the technical
>>> > justification is valid.
>>> >
>>> > I would request that at least PMC members post more than just a naked
>>> > vote, but instead endeavor to give some reason why they have assessed
>>> > the technical justification as they have. I'll start:
>>> >
>>> > Despite all of the discussion related to Dongjoon's -1 vote, I must
>>> > confess to still not being entirely clear on what is his technical
>>> > justification for that veto. I see claims that including an admonition
>>> > in the Spark 4.0.x release notes that a prior upgrade to 3.5.5 is
>>> > required to maintain the integrity of already existing data streams,
>>> > and I see assertions about the maintenance burden that including the
>>> > migration logic would impose on future Spark versions, but I don't
>>> > think that I see any other technical objections. I do not believe that
>>> > the claimed technical justification is valid.
>>> >
>>> > In requiring that a veto of a code change be accompanied by a
>>> > technical justification for the veto, the Apache Voting Process states
>>> > tha

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Holden Karau
First let me start with my key hope:

We find a way to compromise and have the veto withdrawn rather than
overridden.

>From what I understand of the change in question:

So my understanding, and I may be over simplifying here but there are (at
least) three technical paths forward (migration guide, legacy config with
vendor string in it, non-vendor specific string legacy config), a PMC
member vetoed one of them (named vendor legacy config) because he thought a
different approach was better (migration guide) as they were worried that
carrying that legacy config forward would encourage bad coding standards
(eg we would add more vendor named config flags). To me that seems like a
valid concern.

My reasoning:

Thinking back at other VETOs that I’ve been involved with in this project
(DSV2, graceful decom, etc) this seems to meet the same bar. Hell we’ve had
plenty of vetos that didn’t offer an alternative.

My personal understanding of where the bar for “
a technical justification showing why the change is bad” concern is pretty
much “any not factually incorrect reasoning”, the text doesn’t have any
particular “bar” for the level of “badness” and I think it’s up to the PMC
member exercising the veto to use their judgement.

In closing, I feel like the path we’re going down (overriding a veto) is
not healthy for the project.

Twitter: https://twitter.com/holdenkarau
Fight Health Insurance: https://www.fighthealthinsurance.com/

Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Pronouns: she/her


On Sun, Mar 16, 2025 at 6:28 PM Jungtaek Lim 
wrote:

> Holden, I believe you should already know "both" approaches are
> "technically" correct. It's not about which one you have a preference for,
> no, this VOTE is not intended to extend the debate.
>
> Again, what you are encouraged to do here is, not exposing your preference
> of two approaches, but exposing your "technically valid" concern of my
> approach, backed by Dongjoon's veto (most likely you want to quote
> Dongjoon's post). This is very simple and I'm not sure you are doing
> exactly what the VOTE requires.
>
> On Mon, Mar 17, 2025 at 6:32 AM Holden Karau 
> wrote:
>
>> -1 (binding) — to me it doesn’t matter that the cost is low if the
>> objection is technical then I think we need to respect the veto. There is a
>> fundamental disagreement as to what the correct technical way to address
>> this problem is (removal + documentation vs legacy config) and a PMC member
>> has vetoed  the legacy config option.
>>
>> I think I disagree with Mark on the assertion that the veto needs to have
>> “substantial technical concern,” but rather a valid concern. I think in
>> addition to the veto they’ve also gone above and beyond providing
>> alternative ways to accomplish this.
>>
>> On a personal level:
>>
>> I am optimistic we can unblock the release but I think it’s important to
>> err on the side of respecting the veto here in the interest of perceived
>> fairness *especially* because of vendor aspects.
>>
>> To be clear I’ve worked at most of these companies (and many of the
>> people) and I’m not ascribing malice to anyone in this, I think mistakes
>> happen (god knows I’ve had a fair share). I think we’re all doing our best
>> here and would ask that we show everyone understanding regardless of the
>> outcome.
>>
>> Sending hugs and good vibes to y’all.
>>
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> 
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>>
>> On Sat, Mar 15, 2025 at 5:07 PM Holden Karau 
>> wrote:
>>
>>> Given it’s the weekend maybe let’s give folks at least one full work day.
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>> 
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>> Pronouns: she/her
>>>
>>>
>>> On Sat, Mar 15, 2025 at 4:44 PM Mark Hamstra 
>>> wrote:
>>>
 Quick administrative note: I don't see any reason why this vote should
 take a long time, so I expect to close the process and tally the votes
 in not much more than 48 hours.

 On Sat, Mar 15, 2025 at 4:35 PM Mark Hamstra 
 wrote:
 >
 > There has been enough discussion on this topic already, so I think
 > that an immediate vote on the validity of Dongjoon's technical
 > justification for his veto of the "Retain migration logic ... in Spark
 > 4.0.x" proposal is in or

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Wenchen Fan
Hi Holden,

> I think I disagree with Mark on the assertion that the veto needs to have
“substantial technical concern,” but rather a valid concern. I think in
addition to the veto they’ve also gone above and beyond providing
alternative ways to accomplish this.

I think this is not what the Apache policy says in
https://www.apache.org/foundation/glossary.html#Veto : "*All vetoes must be
accompanied by a valid technical justification; a veto without such a
justification is invalid.*"

I think it's better for the Apache Spark community to follow the general
Apache policy.

On Mon, Mar 17, 2025 at 8:49 AM Wenchen Fan  wrote:

> Before I cast my vote here, I'd like to highlight one thing: As the
> release manager of Apache Spark 4.0.0, I was not notified about the
> breaking change of renaming an already-released configuration:
> https://github.com/apache/spark/pull/49897 . Note that the previous VOTE
> from Dongjoon was about Apache Spark 3.5.5, which means there is no
> consensus about what we should do for 4.0.0 yet. I think it's fair for me
> to ask to revert the breaking change and unblock 4.0.0, which is a common
> practice of how we handle breaking changes in Apache Spark and I don't
> think I need a VOTE for it.
>
> Of course, none of us want to keep the misnamed configuration in 4.0.0,
> and it’s clear to me that applying the “configuration deprecation” approach
> from 3.5.5 to 4.0.0 is the best path forward. I don’t believe Dongjoon’s
> veto has valid technical justification, so I’m +1 on this vote.
>
> Thanks,
> Wenchen
>
>
> On Mon, Mar 17, 2025 at 7:27 AM Jungtaek Lim 
> wrote:
>
>> I was trying hard to stay away from this VOTE, but I should have reminded
>> everyone about "what" we are going to VOTE.
>>
>> Dongjoon casted a VETO against code change VOTE. That VETO is described
>> in ASF Voting Process page:
>>
>> https://www.apache.org/foundation/voting.html#Veto
>>
>> A -1 vote by a qualified voter stops a code-modification proposal in its
>>> tracks. This constitutes a veto, and it cannot be overruled nor overridden
>>> by anyone. Vetoes stand until and unless the individual withdraws their
>>> veto.
>>>
>>> To prevent vetoes from being used capriciously, the voter must provide
>>> with the veto a technical justification showing why the change is bad
>>> (opens a security exposure, negatively affects performance, etc. ). A veto
>>> without a justification is invalid and has no weight.
>>
>>
>> For sure, the technical justification must be "objective", otherwise it
>> means if I'm a PMC member I can veto everything if I don't like it.
>>
>> The main argument here about "vendor name in the codebase" is NOT
>> something we have ever seen disallowing this in ASF policy. If there is
>> evidence, it will immediately kill the two VOTEs as it is enough objective
>> argument. But no one was able to bring this up. Please remember, the fact
>> "vendor name in the codebase is bad for any reason", is proven to be NOT an
>> "objective" claim, otherwise how the DISCUSS and VOTE were almost passing
>> with support from PMC members?
>>
>> I really suggest everyone who casts a vote in this VOTE thread, to be
>> based on "objective" rationale. For example, we tend to consider < 10 lines
>> of code to be very trivial to maintain, so the argument of maintenance
>> burden does not apply here. Like this.
>>
>>
>> On Sun, Mar 16, 2025 at 8:38 AM Mark Hamstra 
>> wrote:
>>
>>> There has been enough discussion on this topic already, so I think
>>> that an immediate vote on the validity of Dongjoon's technical
>>> justification for his veto of the "Retain migration logic ... in Spark
>>> 4.0.x" proposal is in order. That technical justification has been
>>> called into question, and the guidance at
>>> https://www.apache.org/foundation/glossary.html#Veto leaves it to the
>>> PMC to determine whether the technical justification is  valid: "In
>>> case of doubt, deciding whether a technical justification is valid is
>>> up to the PMC." As such, only PMC votes will decide the outcome of
>>> this vote. This is neither a vote on a code change itself not a vote
>>> on whether a package is ready for release, so it a procedural vote on
>>> whether the technical justification is valid. As such, the vote will
>>> be decided by a simple majority where +1 votes hold that the technical
>>> justification is not valid and -1 votes hold that the technical
>>> justification is valid.
>>>
>>> I would request that at least PMC members post more than just a naked
>>> vote, but instead endeavor to give some reason why they have assessed
>>> the technical justification as they have. I'll start:
>>>
>>> Despite all of the discussion related to Dongjoon's -1 vote, I must
>>> confess to still not being entirely clear on what is his technical
>>> justification for that veto. I see claims that including an admonition
>>> in the Spark 4.0.x release notes that a prior upgrade to 3.5.5 is
>>> required to maintain the integrity of already existing da

Re: [DISCUSS] Involve any hack / workaround to not include vendor name in migration logic

2025-03-16 Thread Mich Talebzadeh
Hi Jungtaek.

With regard to your point below

"...Hi dev, I'm really tired of the discussion which does not move forward
because the argument is not backed by strict ASF policy"

Regardless, we all appreciate your efforts and your tenacity.

cheers


Dr Mich Talebzadeh,
Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

   view my Linkedin profile






On Sun, 16 Mar 2025 at 08:26, Jungtaek Lim 
wrote:

> Yeah... maybe 1 is simpler if there is no side effect, and probably the
> latter pattern we have is long enough to figure out aliases without full
> text matching.
>
> On Sun, Mar 16, 2025 at 5:15 PM Mark Hamstra 
> wrote:
>
>> Doing something like pattern matching on
>> u0064\u0061\u0074\u0061\u0062\u0072\u0069\u0063\u006b\u0073 instead of
>> “databricks” might also be an option if including “databricks” in the code
>> is believed to be so offensive.
>>
>>
>> On Sun, Mar 16, 2025 at 12:52 AM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Hi dev,
>>>
>>> I'm really tired of the discussion which does not move forward because
>>> the argument is not backed by strict ASF policy. We debate based on
>>> the interpretation of ASF policy by individuals, which I think makes zero
>>> sense.
>>>
>>> I really thought about this a lot how to resolve this, and now I'm open
>>> to have a hack on the migration logic to eliminate the main concern on the
>>> debate, which I can think like following:
>>>
>>> 1) Checks the config string via pattern, like the string contains
>>> ".optimizer.pruneFiltersCanPruneStreamingSubplan" at the end and the string
>>> is longer than "spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan"
>>> (or strictly 12 chars longer).
>>>
>>> 2) Encode the incorrect config name with any hashing algorithm, and use
>>> this to compare with. We do not document about the origin string, but
>>> probably leave the offending ticket to at least figure out when we forgot
>>> about  what was the origin string by ourselves when we ever debug this.
>>>
>>> 3) etcetc (I'm open for better ideas, except just removing the migration
>>> logic.)
>>>
>>> In overall, indirect comparison.
>>>
>>> This definitely complicates the logic, but the logic is really just 4
>>> lines to begin with, and maybe it will be just 10 lines or so, so it's
>>> still something manageable.
>>>
>>> It might be slightly tricky on testing, but this is a lot easier than
>>> debating in there. I'm now regretting not allowing myself to introduce a
>>> hack in earlier days - we should have saved 3 weeks.
>>>
>>> If we are OK to allow a bit of indirect checking on the logic to just
>>> remove out the debate, I'm happy to do that. It's obvious that we can just
>>> leave this migration logic much longer than what I was proposing, because
>>> we eliminate the main concern.
>>>
>>> I'm open to hear about support and objections.
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>


Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Holden Karau
-1 (binding) — to me it doesn’t matter that the cost is low if the
objection is technical then I think we need to respect the veto. There is a
fundamental disagreement as to what the correct technical way to address
this problem is (removal + documentation vs legacy config) and a PMC member
has vetoed  the legacy config option.

I think I disagree with Mark on the assertion that the veto needs to have
“substantial technical concern,” but rather a valid concern. I think in
addition to the veto they’ve also gone above and beyond providing
alternative ways to accomplish this.

On a personal level:

I am optimistic we can unblock the release but I think it’s important to
err on the side of respecting the veto here in the interest of perceived
fairness *especially* because of vendor aspects.

To be clear I’ve worked at most of these companies (and many of the people)
and I’m not ascribing malice to anyone in this, I think mistakes happen
(god knows I’ve had a fair share). I think we’re all doing our best here
and would ask that we show everyone understanding regardless of the outcome.

Sending hugs and good vibes to y’all.

Twitter: https://twitter.com/holdenkarau
Fight Health Insurance: https://www.fighthealthinsurance.com/

Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Pronouns: she/her


On Sat, Mar 15, 2025 at 5:07 PM Holden Karau  wrote:

> Given it’s the weekend maybe let’s give folks at least one full work day.
>
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> 
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her
>
>
> On Sat, Mar 15, 2025 at 4:44 PM Mark Hamstra 
> wrote:
>
>> Quick administrative note: I don't see any reason why this vote should
>> take a long time, so I expect to close the process and tally the votes
>> in not much more than 48 hours.
>>
>> On Sat, Mar 15, 2025 at 4:35 PM Mark Hamstra 
>> wrote:
>> >
>> > There has been enough discussion on this topic already, so I think
>> > that an immediate vote on the validity of Dongjoon's technical
>> > justification for his veto of the "Retain migration logic ... in Spark
>> > 4.0.x" proposal is in order. That technical justification has been
>> > called into question, and the guidance at
>> > https://www.apache.org/foundation/glossary.html#Veto leaves it to the
>> > PMC to determine whether the technical justification is  valid: "In
>> > case of doubt, deciding whether a technical justification is valid is
>> > up to the PMC." As such, only PMC votes will decide the outcome of
>> > this vote. This is neither a vote on a code change itself not a vote
>> > on whether a package is ready for release, so it a procedural vote on
>> > whether the technical justification is valid. As such, the vote will
>> > be decided by a simple majority where +1 votes hold that the technical
>> > justification is not valid and -1 votes hold that the technical
>> > justification is valid.
>> >
>> > I would request that at least PMC members post more than just a naked
>> > vote, but instead endeavor to give some reason why they have assessed
>> > the technical justification as they have. I'll start:
>> >
>> > Despite all of the discussion related to Dongjoon's -1 vote, I must
>> > confess to still not being entirely clear on what is his technical
>> > justification for that veto. I see claims that including an admonition
>> > in the Spark 4.0.x release notes that a prior upgrade to 3.5.5 is
>> > required to maintain the integrity of already existing data streams,
>> > and I see assertions about the maintenance burden that including the
>> > migration logic would impose on future Spark versions, but I don't
>> > think that I see any other technical objections. I do not believe that
>> > the claimed technical justification is valid.
>> >
>> > In requiring that a veto of a code change be accompanied by a
>> > technical justification for the veto, the Apache Voting Process states
>> > that: "To prevent vetoes from being used capriciously, the voter must
>> > provide with the veto a technical justification showing why the change
>> > is bad (opens a security exposure, negatively affects performance,
>> > etc. ). A veto without a justification is invalid and has no weight."
>> > This strongly implies that there must be something objectively wrong
>> > with the proposed code change in that it causes significant harm in
>> > the way of opening a security exposure, negatively affecting
>> > performance, or presumably other significant user harms or perhaps
>> > even developer burdens.
>> >
>> > The proposed addition of the migration logic to Spark 4.0.x does n

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Mich Talebzadeh
I second this point of Holden.

In fairness anything published in a public forum like this one is fair game
for analysis or criticism. That is the practice of analyzing, classifying,
interpreting, or evaluating the technical matter. If someone makes a claim
on a technical or procedure matter in an open forum, then the person is
expected to back it up. *I cannot see how anyone could object to the
statement: if you make a claim or have a strong opinion, be prepared to
prove it or debate it.* Regardless, as stated mistakes can and do happen.

 HTH
Dr Mich Talebzadeh,
Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

   view my Linkedin profile






On Sun, 16 Mar 2025 at 21:32, Holden Karau  wrote:

> -1 (binding) — to me it doesn’t matter that the cost is low if the
> objection is technical then I think we need to respect the veto. There is a
> fundamental disagreement as to what the correct technical way to address
> this problem is (removal + documentation vs legacy config) and a PMC member
> has vetoed  the legacy config option.
>
> I think I disagree with Mark on the assertion that the veto needs to have
> “substantial technical concern,” but rather a valid concern. I think in
> addition to the veto they’ve also gone above and beyond providing
> alternative ways to accomplish this.
>
> On a personal level:
>
> I am optimistic we can unblock the release but I think it’s important to
> err on the side of respecting the veto here in the interest of perceived
> fairness *especially* because of vendor aspects.
>
> To be clear I’ve worked at most of these companies (and many of the
> people) and I’m not ascribing malice to anyone in this, I think mistakes
> happen (god knows I’ve had a fair share). I think we’re all doing our best
> here and would ask that we show everyone understanding regardless of the
> outcome.
>
> Sending hugs and good vibes to y’all.
>
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> 
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her
>
>
> On Sat, Mar 15, 2025 at 5:07 PM Holden Karau 
> wrote:
>
>> Given it’s the weekend maybe let’s give folks at least one full work day.
>>
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> 
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>>
>> On Sat, Mar 15, 2025 at 4:44 PM Mark Hamstra 
>> wrote:
>>
>>> Quick administrative note: I don't see any reason why this vote should
>>> take a long time, so I expect to close the process and tally the votes
>>> in not much more than 48 hours.
>>>
>>> On Sat, Mar 15, 2025 at 4:35 PM Mark Hamstra 
>>> wrote:
>>> >
>>> > There has been enough discussion on this topic already, so I think
>>> > that an immediate vote on the validity of Dongjoon's technical
>>> > justification for his veto of the "Retain migration logic ... in Spark
>>> > 4.0.x" proposal is in order. That technical justification has been
>>> > called into question, and the guidance at
>>> > https://www.apache.org/foundation/glossary.html#Veto leaves it to the
>>> > PMC to determine whether the technical justification is  valid: "In
>>> > case of doubt, deciding whether a technical justification is valid is
>>> > up to the PMC." As such, only PMC votes will decide the outcome of
>>> > this vote. This is neither a vote on a code change itself not a vote
>>> > on whether a package is ready for release, so it a procedural vote on
>>> > whether the technical justification is valid. As such, the vote will
>>> > be decided by a simple majority where +1 votes hold that the technical
>>> > justification is not valid and -1 votes hold that the technical
>>> > justification is valid.
>>> >
>>> > I would request that at least PMC members post more than just a naked
>>> > vote, but instead endeavor to give some reason why they have assessed
>>> > the technical justification as they have. I'll start:
>>> >
>>> > Despite all of the discussion related to Dongjoon's -1 vote, I must
>>> > confess to still not being entirely clear on what is his technical
>>> > justification for that veto. I see claims that including an admonition
>>> > in the Spark 4.0.x release notes that a prior upgrade to 3.5.5 is
>>> > required to maintain the integrity of already existing data streams,
>>> > and I see assertions about the maintenance burden that including the
>>> > migration logic would impose on future Spark versions, but I don't
>>> > think that I see any other technical objections. I do not 

Re: [DISCUSS] New Spark Connect Client repository for Swift language

2025-03-16 Thread Mich Talebzadeh
+1 Sounds like a plan

Dr Mich Talebzadeh,
Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

   view my Linkedin profile






On Sun, 16 Mar 2025 at 21:10, Martin Grund 
wrote:

> So I was just playing with the Swift client to build a Mac app and
> everything worked nicely! However, similar to the Go repository, my
> suggestion would be to handle issues directly in Github and not in Jira
> because the vast majority of initial issues will be simply compatibility
> issues. And creating them in Jira instead of GH where I'm already, is going
> to be a big overhead.
>
> What do you think?
>
> In the Go client, we follow the approach that cross-language issues go to
> Jira, whereas client-specific ones go directly into Github.
>
> On Tue, Mar 11, 2025 at 3:28 AM Jules Damji  wrote:
>
>> + 1 (non-binding)
>>
>> Generally speaking, it’s a good idea to separate repositories for all
>> Spark Connect clients under Spark.
>> - better organization
>> - better visibility
>> - easier for contribution
>> - better for growth & extension of Spark Connect ecosystem
>>
>> Cheers
>> Jules
>> —
>> Sent from my iPhone
>> Pardon the dumb thumb typos :)
>>
>>
>> —
>> Sent from my iPhone
>> Pardon the dumb thumb typos :)
>> > On Mar 10, 2025, at 4:37 PM, Dongjoon Hyun  wrote:
>> >
>> > Thank you everyone for your support.
>> >
>> > New Apache Spark repository is created at the proposed location with
>> ASF license and open for `Spark Connect Client for Swift language`
>> contributions.
>> >
>> > https://github.com/apache/spark-connect-swift
>> >
>> > FYI, this repository will be managed in the same way with
>> `spark-kubernetes-operator` repository.
>> >
>> > Thank you again.
>> >
>> > Dongjoon.
>> >
>> > -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Holden Karau
I believe he intended to VETO this change for Spark 4, of course if
Dongjoon did not (or no longer intends to) then this VOTE becomes moot. I
think bringing up 3.5.5 confuses the issue -- this vote thread is very
clearly about VETOing the change for Spark 4.

I think that accusing each-other of acting in bad faith is unproductive to
resolving this dispute.

On Sun, Mar 16, 2025 at 3:14 PM Jungtaek Lim 
wrote:

> Holden and Dongjoon,
>
> Let me make this vote super simple. I never got the answer from Dongjoon
> about this question. This is super important because if he's casting veto
> "to block", it is a strong indication that he was intended to play with me,
> which I am seriously considering escalating the problem (If this is true,
> it's no longer just a justification of vote, but someone's
> behavioral issue).
>
> https://github.com/apache/spark/pull/49983
>
> I might be missing another timeline, but, if you follow the conversation
> here, there are some facts:
>
> 1. Dongjoon "knew" we were never decided about the direction of Spark
> 4.0.0 behavior. (link
> )
> 2. Dongjoon "agreed" my proposal is technically correct. (link
> )
> 3. Dongjoon "agreed" to hear from the community about discussing my
> proposal. (link
> )
>
> Worth clarifying, 3 happened after we discussed the removal of "config".
> Dongjoon continuously mixed up the fact - while we were in agreement of
> removal of config, removal of migration logic was definitely left to open
> question. Let me give the VOTE Dongjoon drove and made it pass.
>
> https://lists.apache.org/thread/6nn76olr65b8zfgzdcbtr9f6o98451o5
>
> This was totally about 3.5.5. If Dongjoon thinks this simply applies to
> Spark 4.0.0+, it's not, no?
>
> Also, let's revisit the discussion we were discussing about removal of
> config.
>
> https://lists.apache.org/thread/qxqzt7wbtdyxp17d7s1rxhnrswdccsgb
>
> Dongjoon clearly stated that we only make a consensus about Spark 3.5.5,
> and we can continue discussion about the proper behavior in Spark 4.0.0.
> That is the rationale I drove my own discussion. I can be corrected, but
> there is NO discussion/vote w.r.t this topic AFAIK.
>
> Dongjoon, now it's your time to prove there is a valid reason to change
> your mind during this time frame. If the above are all true, you are
> already indicating that you can never cast a veto. (Or show me the evidence
> of how you change your mind for which reason.) If any of the above are
> something you intended to not tell the truth, I am really not sure your
> comment will be truthful I can follow. Especially, if you did not tell the
> truth from 3, e.g. you let me go and discuss while you were intended to
> block me in any phase, this is a strong indication that you intend to play
> with me and the community (or even ASF) has to know that.
>
> Do not evade the root question.
>
> On Mon, Mar 17, 2025 at 6:32 AM Holden Karau 
> wrote:
>
>> -1 (binding) — to me it doesn’t matter that the cost is low if the
>> objection is technical then I think we need to respect the veto. There is a
>> fundamental disagreement as to what the correct technical way to address
>> this problem is (removal + documentation vs legacy config) and a PMC member
>> has vetoed  the legacy config option.
>>
>> I think I disagree with Mark on the assertion that the veto needs to have
>> “substantial technical concern,” but rather a valid concern. I think in
>> addition to the veto they’ve also gone above and beyond providing
>> alternative ways to accomplish this.
>>
>> On a personal level:
>>
>> I am optimistic we can unblock the release but I think it’s important to
>> err on the side of respecting the veto here in the interest of perceived
>> fairness *especially* because of vendor aspects.
>>
>> To be clear I’ve worked at most of these companies (and many of the
>> people) and I’m not ascribing malice to anyone in this, I think mistakes
>> happen (god knows I’ve had a fair share). I think we’re all doing our best
>> here and would ask that we show everyone understanding regardless of the
>> outcome.
>>
>> Sending hugs and good vibes to y’all.
>>
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> 
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>>
>> On Sat, Mar 15, 2025 at 5:07 PM Holden Karau 
>> wrote:
>>
>>> Given it’s the weekend maybe let’s give folks at least one full work day.
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>> 
>>> Books

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Jungtaek Lim
It's OK, I just wanted to let him give the link whenever he makes the
argument. It is NOT important that he brings up a new reasoning at this
point. It must have been there in VOTE.

Also, I think we all agree that this is NOT a place of debate about having
vendor names in the codebase. I think I gave plenty of time from DISCUSSION
and VOTE to let PMC members chime in about their concerns.

Let me remind - this VOTE is, to judge, of Dongjoon's VETO. Again, Mark
clarified the criteria of VETO from quoting ASF page:

>
> In requiring that a veto of a code change be accompanied by a
> technical justification for the veto, the Apache Voting Process states
> that: "To prevent vetoes from being used capriciously, the voter must
> provide with the veto a technical justification showing why the change
> is bad (opens a security exposure, negatively affects performance,
> etc. ). A veto without a justification is invalid and has no weight."
> This strongly implies that there must be something objectively wrong
> with the proposed code change in that it causes significant harm in
> the way of opening a security exposure, negatively affecting
> performance, or presumably other significant user harms or perhaps
> even developer burdens.


Anyone speaking for the judge, the above must be the central reasoning to
do so. If the claim is sidetracked from this criteria of VETO, this just
expands the previous VOTE which we intend to initiate this VOTE to "avoid"
it.

On Mon, Mar 17, 2025 at 7:40 AM Holden Karau  wrote:

> I believe he intended to VETO this change for Spark 4, of course if
> Dongjoon did not (or no longer intends to) then this VOTE becomes moot. I
> think bringing up 3.5.5 confuses the issue -- this vote thread is very
> clearly about VETOing the change for Spark 4.
>
> I think that accusing each-other of acting in bad faith is unproductive to
> resolving this dispute.
>
> On Sun, Mar 16, 2025 at 3:14 PM Jungtaek Lim 
> wrote:
>
>> Holden and Dongjoon,
>>
>> Let me make this vote super simple. I never got the answer from Dongjoon
>> about this question. This is super important because if he's casting veto
>> "to block", it is a strong indication that he was intended to play with me,
>> which I am seriously considering escalating the problem (If this is true,
>> it's no longer just a justification of vote, but someone's
>> behavioral issue).
>>
>> https://github.com/apache/spark/pull/49983
>>
>> I might be missing another timeline, but, if you follow the conversation
>> here, there are some facts:
>>
>> 1. Dongjoon "knew" we were never decided about the direction of Spark
>> 4.0.0 behavior. (link
>> )
>> 2. Dongjoon "agreed" my proposal is technically correct. (link
>> )
>> 3. Dongjoon "agreed" to hear from the community about discussing my
>> proposal. (link
>> )
>>
>> Worth clarifying, 3 happened after we discussed the removal of "config".
>> Dongjoon continuously mixed up the fact - while we were in agreement of
>> removal of config, removal of migration logic was definitely left to open
>> question. Let me give the VOTE Dongjoon drove and made it pass.
>>
>> https://lists.apache.org/thread/6nn76olr65b8zfgzdcbtr9f6o98451o5
>>
>> This was totally about 3.5.5. If Dongjoon thinks this simply applies to
>> Spark 4.0.0+, it's not, no?
>>
>> Also, let's revisit the discussion we were discussing about removal of
>> config.
>>
>> https://lists.apache.org/thread/qxqzt7wbtdyxp17d7s1rxhnrswdccsgb
>>
>> Dongjoon clearly stated that we only make a consensus about Spark 3.5.5,
>> and we can continue discussion about the proper behavior in Spark 4.0.0.
>> That is the rationale I drove my own discussion. I can be corrected, but
>> there is NO discussion/vote w.r.t this topic AFAIK.
>>
>> Dongjoon, now it's your time to prove there is a valid reason to change
>> your mind during this time frame. If the above are all true, you are
>> already indicating that you can never cast a veto. (Or show me the evidence
>> of how you change your mind for which reason.) If any of the above are
>> something you intended to not tell the truth, I am really not sure your
>> comment will be truthful I can follow. Especially, if you did not tell the
>> truth from 3, e.g. you let me go and discuss while you were intended to
>> block me in any phase, this is a strong indication that you intend to play
>> with me and the community (or even ASF) has to know that.
>>
>> Do not evade the root question.
>>
>> On Mon, Mar 17, 2025 at 6:32 AM Holden Karau 
>> wrote:
>>
>>> -1 (binding) — to me it doesn’t matter that the cost is low if the
>>> objection is technical then I think we need to respect the veto. There is a
>>> fundamental disagreement as to what the correct technical way to address
>>> this problem is (removal + documentation vs legacy conf

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Reynold Xin
Thanks Mark for starting this. +1 and agree with your reasoning.

Wearing the Apache Spark PMC hat, I think having a few lines of
straightforward logic to ease users' migrations is a no-brainer to do.
Imagine how confused a user would be when they upgraded to 4.0 and things
stopped working in a way that's not obvious ...

Wearing the Databricks hat, I honestly don't care about this issue at all.
Our customers will have no trouble upgrading regardless of this change.



On Sat, Mar 15, 2025 at 4:38 PM Mark Hamstra  wrote:

> There has been enough discussion on this topic already, so I think
> that an immediate vote on the validity of Dongjoon's technical
> justification for his veto of the "Retain migration logic ... in Spark
> 4.0.x" proposal is in order. That technical justification has been
> called into question, and the guidance at
> https://www.apache.org/foundation/glossary.html#Veto leaves it to the
> PMC to determine whether the technical justification is  valid: "In
> case of doubt, deciding whether a technical justification is valid is
> up to the PMC." As such, only PMC votes will decide the outcome of
> this vote. This is neither a vote on a code change itself not a vote
> on whether a package is ready for release, so it a procedural vote on
> whether the technical justification is valid. As such, the vote will
> be decided by a simple majority where +1 votes hold that the technical
> justification is not valid and -1 votes hold that the technical
> justification is valid.
>
> I would request that at least PMC members post more than just a naked
> vote, but instead endeavor to give some reason why they have assessed
> the technical justification as they have. I'll start:
>
> Despite all of the discussion related to Dongjoon's -1 vote, I must
> confess to still not being entirely clear on what is his technical
> justification for that veto. I see claims that including an admonition
> in the Spark 4.0.x release notes that a prior upgrade to 3.5.5 is
> required to maintain the integrity of already existing data streams,
> and I see assertions about the maintenance burden that including the
> migration logic would impose on future Spark versions, but I don't
> think that I see any other technical objections. I do not believe that
> the claimed technical justification is valid.
>
> In requiring that a veto of a code change be accompanied by a
> technical justification for the veto, the Apache Voting Process states
> that: "To prevent vetoes from being used capriciously, the voter must
> provide with the veto a technical justification showing why the change
> is bad (opens a security exposure, negatively affects performance,
> etc. ). A veto without a justification is invalid and has no weight."
> This strongly implies that there must be something objectively wrong
> with the proposed code change in that it causes significant harm in
> the way of opening a security exposure, negatively affecting
> performance, or presumably other significant user harms or perhaps
> even developer burdens.
>
> The proposed addition of the migration logic to Spark 4.0.x does not
> cause any harm to Spark's users. For many users, those not using
> streaming data, the change will have no effect. For streaming users
> the change will be beneficial, not harmful.
>
> Neither do I find the claim of excessive, ongoing developer burden to
> be persuasive. The changes are tiny and easily maintained -- in fact,
> it wouldn't surprise me if no further changes to this migration logic
> would be needed for a very long time.
>
> Some of what we are left with is just an expression of preference for
> a technical alternative to the migration logic -- i.e. including in
> the release notes an admonition to first upgrade to 3.5.5. But the
> Apache Voting Process does not say that in the face of code
> alternatives A and B, a qualified voter is justified in vetoing A if
> they prefer B. Instead, the Voting Process strongly implies that
> something more is needed to justify a veto, as I've already covered.
> Thus I don't find Dongjoon's preference for the release notes option
> to be adequate justification for the veto.
>
> The only remaining question I see is whether including "databricks" in
> the Apache Code is ever allowed or if any such instance must be
> expunged as soon as possible. I am not aware of any ASF policy that
> strictly forbids the mention of a vendor in Apache code for any
> reason, even if that vendor has a product based on Apache code, even
> if that vendor enjoys a uniquely influential position vis a vis some
> Apache code or project. Certainly the PMC has a duty to see to it that
> neither Databricks nor any other vendor exercises influence or control
> over Apache Spark outside of the established Apache process, but the
> proposed migration code changes do not advantage Databricks -- if
> anything they remove a minor avenue of influence, and simply need to
> mention "databricks" once in order match and transform a c

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Jungtaek Lim
Holden, I think I have some workaround like I posted in dev@ (link
), which
is definitely less better than this proposal so I still want to see this
VOTE to go forward, but it's somewhat better in this situation that we no
longer talk about vendor name, hence no need to debate for more minor
versions.

Though I still want to hear the community's voice that "Are we really
thinking that forcing users to take the upgrade path as we guided is making
sense?". I'm not strongly arguing that the vendor name has to be in the
codebase (but if this is easier way then we should weigh), but I strongly
argue that we have no control of users, so we must always think how we can
achieve letting users do whatever they want, with minimized inconvenience
(since we can't always make things not break). I think this meta question
is a strong argument from me, and I'm -1 to Dongjoon's proposal because his
proposal is breaking this.

I strongly believe there is NO way to force users to upgrade to a specific
version - I think we should have sought alternatives when this came up with
the main proposal.

2025년 3월 17일 (월) 오후 12:10, Holden Karau 님이 작성:

> I'm delighted to see folks talking about a compromise. However, instead of
> just asking Dongjoon to withdraw the VETO perhaps folks can suggest
> alternatives that that might meet some of both parties goals?
>
> On Sun, Mar 16, 2025 at 7:41 PM Wenchen Fan  wrote:
>
>> I agree with Holden that withdrawing a veto is always better than
>> overriding it: it's healthier for the community. Dongjoon, would you be
>> willing to reconsider your veto given the current as-is state of the 4.0.0
>> release (the breaking change will be reverted)?
>>
>> On Mon, Mar 17, 2025 at 10:36 AM Wenchen Fan  wrote:
>>
>>> I've created the revert PR for branch-4.0:
>>> https://github.com/apache/spark/pull/50291 . We can merge PRs with lazy
>>> consensus but it's clear that this breaking change PR has failed to achieve
>>> consensus.
>>>
>>> I hope we now have a clear foundation for discussing solutions. As it
>>> stands, the misnamed configuration will be released in 4.0.0. I like
>>> Jungtaek’s proposal to deprecate it, but the decision is up to the
>>> community.
>>>
>>> On Mon, Mar 17, 2025 at 10:19 AM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
 OK, let's be super honest.

 Again, I think you agree that *"both" proposals are "technically"
 correct (or one side can't have a strong theoretical evidence to counter
 the other side)*. So this naturally has a fate to have more supporters
 to get to the end. It's very easy for me to VETO to his proposal (although
 I don't have a binding vote, I think I have people who agree with me) if we
 think we want to definitely expand the interpretation of VETO criteria in
 the Apache Voting Process.

 You said it is up to the PMC member exercising the veto to use their
 judgement, but definitely, it must not be used to force the community to
 follow his proposal. The major argument here is, he can just VETO to any
 proposal to retain the codebase as the way he prefers to, which I don't
 believe is a correct usage of VETO.

 If we just revert the change of removal of config, this is "really"
 neutral neither my proposal nor his proposal. Do we really want to do so?

 On Mon, Mar 17, 2025 at 10:55 AM Holden Karau 
 wrote:

> First let me start with my key hope:
>
> We find a way to compromise and have the veto withdrawn rather than
> overridden.
>
> From what I understand of the change in question:
>
> So my understanding, and I may be over simplifying here but there are
> (at least) three technical paths forward (migration guide, legacy config
> with vendor string in it, non-vendor specific string legacy config), a PMC
> member vetoed one of them (named vendor legacy config) because he thought 
> a
> different approach was better (migration guide) as they were worried that
> carrying that legacy config forward would encourage bad coding standards
> (eg we would add more vendor named config flags). To me that seems like a
> valid concern.
>
> My reasoning:
>
> Thinking back at other VETOs that I’ve been involved with in this
> project (DSV2, graceful decom, etc) this seems to meet the same bar. Hell
> we’ve had plenty of vetos that didn’t offer an alternative.
>
> My personal understanding of where the bar for “
> a technical justification showing why the change is bad” concern is
> pretty much “any not factually incorrect reasoning”, the text doesn’t have
> any particular “bar” for the level of “badness” and I think it’s up to the
> PMC member exercising the veto to use their judgement.
>
> In closing, I feel like the path we’re going down (overriding a veto)
> is not healthy for the 

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Jungtaek Lim
If we are really wanting to make a "correct" discussion going forward, I
believe the revert PR has to be merged. After that, either my proposal gets
not accepted, or he starts to DISCUSS and eventually reaches the VOTE
pass, or we just leave the config to be kept deprecated instead of removed.

We don't need to do this right now because this work is not necessary if
this VOTE has passed, but if this VOTE fails, I argue that the revert PR
must be merged, because the VOTE just means that he can just block my
proposal. It is never meant that he got consensus on his proposal. That
VOTE must happen separately, and during the time I want to see the codebase
to be "neutral".

On Mon, Mar 17, 2025 at 11:36 AM Wenchen Fan  wrote:

> I've created the revert PR for branch-4.0:
> https://github.com/apache/spark/pull/50291 . We can merge PRs with lazy
> consensus but it's clear that this breaking change PR has failed to achieve
> consensus.
>
> I hope we now have a clear foundation for discussing solutions. As it
> stands, the misnamed configuration will be released in 4.0.0. I like
> Jungtaek’s proposal to deprecate it, but the decision is up to the
> community.
>
> On Mon, Mar 17, 2025 at 10:19 AM Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
>
>> OK, let's be super honest.
>>
>> Again, I think you agree that *"both" proposals are "technically"
>> correct (or one side can't have a strong theoretical evidence to counter
>> the other side)*. So this naturally has a fate to have more supporters
>> to get to the end. It's very easy for me to VETO to his proposal (although
>> I don't have a binding vote, I think I have people who agree with me) if we
>> think we want to definitely expand the interpretation of VETO criteria in
>> the Apache Voting Process.
>>
>> You said it is up to the PMC member exercising the veto to use their
>> judgement, but definitely, it must not be used to force the community to
>> follow his proposal. The major argument here is, he can just VETO to any
>> proposal to retain the codebase as the way he prefers to, which I don't
>> believe is a correct usage of VETO.
>>
>> If we just revert the change of removal of config, this is "really"
>> neutral neither my proposal nor his proposal. Do we really want to do so?
>>
>> On Mon, Mar 17, 2025 at 10:55 AM Holden Karau 
>> wrote:
>>
>>> First let me start with my key hope:
>>>
>>> We find a way to compromise and have the veto withdrawn rather than
>>> overridden.
>>>
>>> From what I understand of the change in question:
>>>
>>> So my understanding, and I may be over simplifying here but there are
>>> (at least) three technical paths forward (migration guide, legacy config
>>> with vendor string in it, non-vendor specific string legacy config), a PMC
>>> member vetoed one of them (named vendor legacy config) because he thought a
>>> different approach was better (migration guide) as they were worried that
>>> carrying that legacy config forward would encourage bad coding standards
>>> (eg we would add more vendor named config flags). To me that seems like a
>>> valid concern.
>>>
>>> My reasoning:
>>>
>>> Thinking back at other VETOs that I’ve been involved with in this
>>> project (DSV2, graceful decom, etc) this seems to meet the same bar. Hell
>>> we’ve had plenty of vetos that didn’t offer an alternative.
>>>
>>> My personal understanding of where the bar for “
>>> a technical justification showing why the change is bad” concern is
>>> pretty much “any not factually incorrect reasoning”, the text doesn’t have
>>> any particular “bar” for the level of “badness” and I think it’s up to the
>>> PMC member exercising the veto to use their judgement.
>>>
>>> In closing, I feel like the path we’re going down (overriding a veto) is
>>> not healthy for the project.
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>> 
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>> Pronouns: she/her
>>>
>>>
>>> On Sun, Mar 16, 2025 at 6:28 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
 Holden, I believe you should already know "both" approaches are
 "technically" correct. It's not about which one you have a preference for,
 no, this VOTE is not intended to extend the debate.

 Again, what you are encouraged to do here is, not exposing your
 preference of two approaches, but exposing your "technically valid" concern
 of my approach, backed by Dongjoon's veto (most likely you want to quote
 Dongjoon's post). This is very simple and I'm not sure you are doing
 exactly what the VOTE requires.

 On Mon, Mar 17, 2025 at 6:32 AM Holden Karau 
 wrote:

> -1 (binding) — to me it doesn’t matter that the cost is low if the
> objection is technical then I 

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Dongjoon Hyun
I reviewed Wenchen's reverting PR. Although it's a proposal for discussion, it 
is another breaking change against Apache Spark 3.5.5, isn't it? If we consider 
Apache Spark 3.5.4 users, I believe we need to consider Apache Spark 3.5.5 
users too which uses `spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan` 
already.

https://github.com/apache/spark/pull/50291

- buildConf("spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan")
+ 
buildConf("spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan")

Dongjoon.

On 2025/03/17 02:36:42 Wenchen Fan wrote:
> I've created the revert PR for branch-4.0:
> https://github.com/apache/spark/pull/50291 . We can merge PRs with lazy
> consensus but it's clear that this breaking change PR has failed to achieve
> consensus.
> 
> I hope we now have a clear foundation for discussing solutions. As it
> stands, the misnamed configuration will be released in 4.0.0. I like
> Jungtaek’s proposal to deprecate it, but the decision is up to the
> community.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Jungtaek Lim
Btw, I don't think reverting the PR of removing config is violating
community's consensus, because we never had a VOTE for this breaking change
(this is definitely a breaking change without migration logic), so we did
not really get explicit consensus with that. The real neutral state is that
we do not have a code change for removing config. (I'm open to hear
objections.)

It was merged based on the assumption that we will figure out how to NOT
break backward compatibility, and since this is NOT addressed, I argue the
change of removing config has to be reverted. Even further, this does not
violate the community's consensus since I have strong evidence of VETO on
that PR - We generally do not consider breaking change without any
alternative to be valid change.

I'm happy to go and cast -1 to that PR (committer should have been
considered to have a binding vote in code, otherwise my +1 cannot merge the
PR), and revert unless anyone objects with valid reason.

On Mon, Mar 17, 2025 at 12:31 PM Jungtaek Lim 
wrote:

> Holden, I think I have some workaround like I posted in dev@ (link
> ),
> which is definitely less better than this proposal so I still want to see
> this VOTE to go forward, but it's somewhat better in this situation that we
> no longer talk about vendor name, hence no need to debate for more minor
> versions.
>
> Though I still want to hear the community's voice that "Are we really
> thinking that forcing users to take the upgrade path as we guided is making
> sense?". I'm not strongly arguing that the vendor name has to be in the
> codebase (but if this is easier way then we should weigh), but I strongly
> argue that we have no control of users, so we must always think how we can
> achieve letting users do whatever they want, with minimized inconvenience
> (since we can't always make things not break). I think this meta question
> is a strong argument from me, and I'm -1 to Dongjoon's proposal because his
> proposal is breaking this.
>
> I strongly believe there is NO way to force users to upgrade to a specific
> version - I think we should have sought alternatives when this came up with
> the main proposal.
>
> 2025년 3월 17일 (월) 오후 12:10, Holden Karau 님이 작성:
>
>> I'm delighted to see folks talking about a compromise. However, instead
>> of just asking Dongjoon to withdraw the VETO perhaps folks can suggest
>> alternatives that that might meet some of both parties goals?
>>
>> On Sun, Mar 16, 2025 at 7:41 PM Wenchen Fan  wrote:
>>
>>> I agree with Holden that withdrawing a veto is always better than
>>> overriding it: it's healthier for the community. Dongjoon, would you be
>>> willing to reconsider your veto given the current as-is state of the 4.0.0
>>> release (the breaking change will be reverted)?
>>>
>>> On Mon, Mar 17, 2025 at 10:36 AM Wenchen Fan 
>>> wrote:
>>>
 I've created the revert PR for branch-4.0:
 https://github.com/apache/spark/pull/50291 . We can merge PRs with
 lazy consensus but it's clear that this breaking change PR has failed to
 achieve consensus.

 I hope we now have a clear foundation for discussing solutions. As it
 stands, the misnamed configuration will be released in 4.0.0. I like
 Jungtaek’s proposal to deprecate it, but the decision is up to the
 community.

 On Mon, Mar 17, 2025 at 10:19 AM Jungtaek Lim <
 kabhwan.opensou...@gmail.com> wrote:

> OK, let's be super honest.
>
> Again, I think you agree that *"both" proposals are "technically"
> correct (or one side can't have a strong theoretical evidence to counter
> the other side)*. So this naturally has a fate to have more
> supporters to get to the end. It's very easy for me to VETO to his 
> proposal
> (although I don't have a binding vote, I think I have people who agree 
> with
> me) if we think we want to definitely expand the interpretation of VETO
> criteria in the Apache Voting Process.
>
> You said it is up to the PMC member exercising the veto to use their
> judgement, but definitely, it must not be used to force the community to
> follow his proposal. The major argument here is, he can just VETO to any
> proposal to retain the codebase as the way he prefers to, which I don't
> believe is a correct usage of VETO.
>
> If we just revert the change of removal of config, this is "really"
> neutral neither my proposal nor his proposal. Do we really want to do so?
>
> On Mon, Mar 17, 2025 at 10:55 AM Holden Karau 
> wrote:
>
>> First let me start with my key hope:
>>
>> We find a way to compromise and have the veto withdrawn rather than
>> overridden.
>>
>> From what I understand of the change in question:
>>
>> So my understanding, and I may be over simplifying here but there are
>> (at least) three technical paths forward (migration guide, legacy config

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Jungtaek Lim
The problem you are mentioning is arguably resolvable if we leave a config
name as it is, and add
withAlternative("spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan").
Let's not nitpick just from reverting the PR. We have to revert the PR
"semantically".

Btw, from what I understand of dealing with backport PR is, we mostly merge
in master to begin with, and down the version line. When we handle my
migration PR, we do not follow this practice without any discussion. I
submitted PRs for master/4.0/3.5, and 3.5 was merged "first", and when I
asked for merging to 4.0, I was pushed back. I don't think this is an
ordinary practice of dealing with multiple versions of PRs, especially
since the author never agreed with the way of processing.

If we were following the practice, we should already have migration logic
for master/4.0/3.5, since we should have the same fix in master/4.0. We
probably had a discussion about removing config from master/4.0 based on
the discussion, and we probably agreed to remove the config since we still
have a migration logic. W.r.t migration logic, based on the discussion we
are having, we probably can't make an agreement to take it out, then
arguably the migration logic is left as it is.

This way I never needed to drive such a long and sensitive DISCUSSION and
VOTE. But the PR for master and 4.0 weren't merged because of the
individual's belief of the rollout plan, which we are seeing does not fit a
majority of voices now.

Shall we fix the broken process in which we made a huge mistake before
moving on? We should have merged the same content in 3.5 to master/4.0 as
well, and then have a PR to remove the config. This is totally swapped
which does not make sense to me.

On Mon, Mar 17, 2025 at 1:57 PM Dongjoon Hyun  wrote:

> I reviewed Wenchen's reverting PR. Although it's a proposal for
> discussion, it is another breaking change against Apache Spark 3.5.5, isn't
> it? If we consider Apache Spark 3.5.4 users, I believe we need to consider
> Apache Spark 3.5.5 users too which uses
> `spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan` already.
>
> https://github.com/apache/spark/pull/50291
>
> - buildConf("spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan")
> +
> buildConf("spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan")
>
> Dongjoon.
>
> On 2025/03/17 02:36:42 Wenchen Fan wrote:
> > I've created the revert PR for branch-4.0:
> > https://github.com/apache/spark/pull/50291 . We can merge PRs with lazy
> > consensus but it's clear that this breaking change PR has failed to
> achieve
> > consensus.
> >
> > I hope we now have a clear foundation for discussing solutions. As it
> > stands, the misnamed configuration will be released in 4.0.0. I like
> > Jungtaek’s proposal to deprecate it, but the decision is up to the
> > community.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>