Re: [VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

Jungtaek Lim Thu, 13 Mar 2025 00:16:48 -0700

Dongjoon,

I wonder whether I can influence you to revisit what has happened. We are
here because we have two different approaches where we have agreed to
disagree on the approach. We posted the discussion in dev@ because we want
to hear from 3rd eyes. In DISCUSSION and VOTE threads, I'm seeing support
(including non-brickster) from people about my proposal. They are users of
Apache Spark, and we have a responsibility to hear from them.


I'd like to see whether you can revisit this, confirming people's
preference on this topic. Personally I don't think this warrants VOTE, so I
can do whatever simplest and easiest and fastest way to just solve the
issue, including just cancelling the VOTE if we have consensus.

Hope we avoid spending time on this longer.

Thanks,
Jungtaek Lim (HeartSaVioR)

On Mon, Mar 10, 2025 at 11:53 PM Dongjoon Hyun <[email protected]> wrote:

> -1 because there exists a feasible migration path for Apache Spark 3.5.4
> via Apache Spark 3.5.5.
>
> It's obvious that this Databricks' mistake already causes a huge
> communication cost in the Apache Spark community and is suggesting a burden
> to enforce us to handle at least two more PRs at 4.0.0 and 4.1.0.
>
> Given that, I don't think
> - This is an inevitable or
> - This is 0 cost
>
> Dongjoon.
>
> On 2025/03/10 12:46:16 Jungtaek Lim wrote:
> > Starting from my +1 (non-binding).
> >
> > In addition, I propose to retain migration logic till Spark 4.1.x and
> > remove it in Spark 4.2.0.
> >
> > On Mon, Mar 10, 2025 at 9:44 PM Jungtaek Lim <
> [email protected]>
> > wrote:
> >
> > > Hi dev,
> > >
> > > Please vote to retain migration logic of incorrect `spark.databricks.*`
> > > configuration in Spark 4.0.x.
> > >
> > > - DISCUSSION:
> > > https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr
> > > ([DISCUSS] Handling spark.databricks.* config being exposed in 3.5.4 in
> > > Spark 4.0.0+)
> > >
> > > Specifically, please review this post
> > > https://lists.apache.org/thread/xtq1kjhsl4ohfon78z3wld2hmfm78t9k which
> > > explains pros and cons about the proposal - proposal is about "Option
> 1".
> > >
> > > Simply speaking, this vote is to allow streaming queries which had been
> > > ever run in Spark 3.5.4 to be upgraded with Spark 4.0.x, "without
> having to
> > > be upgraded with Spark 3.5.5+ in prior". If the vote passes, we will
> help
> > > users to have a smooth upgrade from Spark 3.5.4 to Spark 4.0.x, which
> would
> > > be almost 1 year.
> > >
> > > The (only) cons in this option is having to retain the incorrect
> > > configuration name as "string" in the codebase a bit longer. The code
> > > complexity of migration logic is arguably trivial. (link
> > > <
> https://github.com/apache/spark/blob/4231d58245251a34ae80a38ea4bbf7d720caa439/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L174-L183
> >
> > > )
> > >
> > > This VOTE is for Spark 4.0.x, but if someone supports including
> migration
> > > logic to be longer than Spark 4.0.x, please cast +1 here and leave the
> > > desired last minor version of Spark to retain this migration logic.
> > >
> > > The vote is open for the next 72 hours and passes if a majority +1 PMC
> > > votes are cast, with a minimum of 3 +1 votes.
> > >
> > > [ ] +1 Retain migration logic of incorrect `spark.databricks.*`
> > > configuration in Spark 4.0.x
> > > [ ] -1 Remove migration logic of incorrect `spark.databricks.*`
> > > configuration in Spark 4.0.0 because...
> > >
> > > Thanks!
> > > Jungtaek Lim (HeartSaVioR)
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>
>

Re: [VOTE] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

Reply via email to