Re: [VOTE] Release Spark 4.1.0 (RC3)

Dongjoon Hyun Mon, 15 Dec 2025 16:24:11 -0800

Thank you, Anton.

I also have been supportive and trying to help the whole feature and subtasks.


SPARK-54274 Support `MERGE INTO` Schema Evolution

However, it's hard to believe that those nested field area is robustly tested 
and stable in Apache Spark 4.1.0 timeframe. As you know, we already merged and 
reverted the same code very urgently during the voting period. I'm still not 
sure we built enough community agreement on that new code.

Since "spark.sql.mergeNestedTypeCoercion.enabled" is disabled not only in 
Apache Spark 4.1.0, but also Apache Spark 4.2.0 still. Let's give the global 
Apache Spark community more time to evaluate that nested field area in Apache 
Spark 4.2.0 timeframe.

Note that I already commented the decision on the PR last week.

Since you asked, I'll block that PR explicitly until we release Apache Spark 
4.2.0.

Dongjoon.

On 2025/12/16 00:01:39 Anton Okolnychyi wrote:
> I think we also need to get https://github.com/apache/spark/pull/53360 in.
> We detected a data loss in Iceberg MERGE after some changes in 4.1.
> 
> - Anton
> 
> пн, 15 груд. 2025 р. о 15:56 Dongjoon Hyun <[email protected]> пише:
> 
> > Thank you all for your new feedback on RC3.
> >
> > I am concluding this RC3 vote as not passed again and preparing RC4.
> >
> > RC4 is RC3 + the following patches which landed at branch-4.1 currently.
> > Please let me know if you need more patches.
> >
> > SPARK-54696 Clean-up ArrowBuffers in Connect
> > SPARK-54686 Relax DSv2 table checks in temp views to allow new top-level
> > columns
> > SPARK-53991 Enforce KLL_SKETCH_AGG_GET_RANK/QUANTILE arguments are foldable
> > SPARK-54692 Add python_worker_logs tvf doc to API reference
> > SPARK-54683 Unify geo and time types blocking
> > SPARK-54689 Make `org.apache.spark.sql.pipelines` internal package and
> > make `EstimatorUtils` private
> > SPARK-54695 StandaloneDynamicAllocationSuite.syncExecutors should ensure
> > executors have fully setup
> >
> > Dongjoon Hyun.
> >
> > On 2025/12/15 14:59:32 Herman van Hovell via dev wrote:
> > > I pasted a non-existing link for the root cause. The actual link is here:
> > > https://issues.apache.org/jira/browse/SPARK-53342
> > >
> > >
> > > On Mon, Dec 15, 2025 at 10:47 AM Herman van Hovell <
> > [email protected]>
> > > wrote:
> > >
> > > > Hey Dongjoon,
> > > >
> > > > Regarding your questions.
> > > >
> > > >    1. If you define a large-ish local relation (which makes us cache it
> > > >    on the serverside) and keep using it, then leak off-heap memory
> > every time
> > > >    it is being used. At some point the OS will OOM kill the driver.
> > While I
> > > >    have a repro, testing it like this in CI is not a good idea. As an
> > > >    alternative I am working on a test that checks buffer clean-up.For
> > the
> > > >    record I don't appreciate the term `claim` here; I am not blocking a
> > > >    release without genuine concern.
> > > >    2. The root cause is
> > > >    https://databricks.atlassian.net/browse/SPARK-53342 and not the
> > large
> > > >    local relations work.
> > > >    3. A PR has been open since Friday:
> > > >    https://github.com/apache/spark/pull/53452. I hope that I can get
> > it
> > > >    merged today.
> > > >    4. I don't see a reason why.
> > > >
> > > > Cheers,
> > > > Herman
> > > >
> > > > On Mon, Dec 15, 2025 at 5:47 AM Dongjoon Hyun <[email protected]>
> > wrote:
> > > >
> > > >> How can we verify the regression, Herman?
> > > >>
> > > >> It's a little difficult for me to evaluate your claim so far due to
> > the
> > > >> lack of the shared information. Specifically, there is no update for
> > last 3
> > > >> days on "SPARK-54696 (Spark Connect LocalRelation support leak
> > off-heap
> > > >> memory)" after you created it.
> > > >>
> > > >> Could you provide us more technical information about your Spark
> > Connect
> > > >> issue?
> > > >>
> > > >> 1. How can we reproduce your claim? Do you have a test case?
> > > >>
> > > >> 2. For the root cause, I'm wondering if you are saying literally
> > > >> SPARK-53917 (Support large local relations) or another JIRA issue.
> > Which
> > > >> commit is the root cause?
> > > >>
> > > >> 3. Since you assigned SPARK-54696 to yourself for last 3 days, do you
> > > >> want to provide a PR soon?
> > > >>
> > > >> 4. If you need more time, shall we simply revert the root cause from
> > > >> Apache Spark 4.1.0 ?
> > > >>
> > > >> Thanks,
> > > >> Dongjoon
> > > >>
> > > >> On 2025/12/14 23:29:59 Herman van Hovell via dev wrote:
> > > >> > Yes. It is a regression in Spark 4.1. The root cause is a change
> > where
> > > >> we
> > > >> > fail to clean-up allocated (off-heap) buffers.
> > > >> >
> > > >> > On Sun, Dec 14, 2025 at 4:25 AM Dongjoon Hyun <[email protected]>
> > > >> wrote:
> > > >> >
> > > >> > > Hi, Herman.
> > > >> > >
> > > >> > > Do you mean that is a regression at Apache Spark 4.1.0?
> > > >> > >
> > > >> > > If then, do you know what was the root cause?
> > > >> > >
> > > >> > > Dongjoon.
> > > >> > >
> > > >> > > On 2025/12/13 23:09:02 Herman van Hovell via dev wrote:
> > > >> > > > -1. We need to get
> > > >> https://issues.apache.org/jira/browse/SPARK-54696
> > > >> > > fixed.
> > > >> > > >
> > > >> > > > On Sat, Dec 13, 2025 at 11:07 AM Jules Damji <
> > [email protected]
> > > >> >
> > > >> > > wrote:
> > > >> > > >
> > > >> > > > > +1 non-binding
> > > >> > > > > —
> > > >> > > > > Sent from my iPhone
> > > >> > > > > Pardon the dumb thumb typos :)
> > > >> > > > >
> > > >> > > > > > On Dec 11, 2025, at 8:34 AM, [email protected] wrote:
> > > >> > > > > >
> > > >> > > > > > Please vote on releasing the following candidate as Apache
> > > >> Spark
> > > >> > > > > version 4.1.0.
> > > >> > > > > >
> > > >> > > > > > The vote is open until Sun, 14 Dec 2025 09:34:31 PST and
> > passes
> > > >> if a
> > > >> > > > > majority +1 PMC votes are cast, with
> > > >> > > > > > a minimum of 3 +1 votes.
> > > >> > > > > >
> > > >> > > > > > [ ] +1 Release this package as Apache Spark 4.1.0
> > > >> > > > > > [ ] -1 Do not release this package because ...
> > > >> > > > > >
> > > >> > > > > > To learn more about Apache Spark, please see
> > > >> > > https://spark.apache.org/
> > > >> > > > > >
> > > >> > > > > > The tag to be voted on is v4.1.0-rc3 (commit e221b56be7b):
> > > >> > > > > > https://github.com/apache/spark/tree/v4.1.0-rc3
> > > >> > > > > >
> > > >> > > > > > The release files, including signatures, digests, etc. can
> > be
> > > >> found
> > > >> > > at:
> > > >> > > > > >
> > https://dist.apache.org/repos/dist/dev/spark/v4.1.0-rc3-bin/
> > > >> > > > > >
> > > >> > > > > > Signatures used for Spark RCs can be found in this file:
> > > >> > > > > > https://downloads.apache.org/spark/KEYS
> > > >> > > > > >
> > > >> > > > > > The staging repository for this release can be found at:
> > > >> > > > > >
> > > >> > >
> > > >>
> > https://repository.apache.org/content/repositories/orgapachespark-1508/
> > > >> > > > > >
> > > >> > > > > > The documentation corresponding to this release can be
> > found at:
> > > >> > > > > >
> > https://dist.apache.org/repos/dist/dev/spark/v4.1.0-rc3-docs/
> > > >> > > > > >
> > > >> > > > > > The list of bug fixes going into 4.1.0 can be found at the
> > > >> following
> > > >> > > URL:
> > > >> > > > > >
> > https://issues.apache.org/jira/projects/SPARK/versions/12355581
> > > >> > > > > >
> > > >> > > > > > FAQ
> > > >> > > > > >
> > > >> > > > > > =========================
> > > >> > > > > > How can I help test this release?
> > > >> > > > > > =========================
> > > >> > > > > >
> > > >> > > > > > If you are a Spark user, you can help us test this release
> > by
> > > >> taking
> > > >> > > > > > an existing Spark workload and running on this release
> > > >> candidate,
> > > >> > > then
> > > >> > > > > > reporting any regressions.
> > > >> > > > > >
> > > >> > > > > > If you're working in PySpark you can set up a virtual env
> > and
> > > >> install
> > > >> > > > > > the current RC via "pip install
> > > >> > > > >
> > > >> > >
> > > >>
> > https://dist.apache.org/repos/dist/dev/spark/v4.1.0-rc3-bin/pyspark-4.1.0.tar.gz
> > > >> > > > > "
> > > >> > > > > > and see if anything important breaks.
> > > >> > > > > > In the Java/Scala, you can add the staging repository to
> > your
> > > >> > > project's
> > > >> > > > > resolvers and test
> > > >> > > > > > with the RC (make sure to clean up the artifact cache
> > > >> before/after so
> > > >> > > > > > you don't end up building with an out of date RC going
> > forward).
> > > >> > > > > >
> > > >> > > > > >
> > > >> ---------------------------------------------------------------------
> > > >> > > > > > To unsubscribe e-mail: [email protected]
> > > >> > > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> ---------------------------------------------------------------------
> > > >> > > > > To unsubscribe e-mail: [email protected]
> > > >> > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> > >
> > ---------------------------------------------------------------------
> > > >> > > To unsubscribe e-mail: [email protected]
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > > >> ---------------------------------------------------------------------
> > > >> To unsubscribe e-mail: [email protected]
> > > >>
> > > >>
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: [email protected]
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: [VOTE] Release Spark 4.1.0 (RC3)

Reply via email to