Re: [DISCUSS] Next Release (4.3.0)

Ayush Saxena Wed, 17 Jun 2026 23:38:17 -0700

Hi Folks,

As always, I tend to look at this from a slightly different
perspective: what does this release actually bring to users? Is there
enough value here that people will invest the effort to upgrade, or
will this end up being a release where we, as developers, put
significant effort into release management while users remain largely
uninterested in moving from previous versions?


I do know that there has been meaningful work going into this release.
There is some Iceberg V3-related work, although personally I don't
think that's particularly compelling without also moving to Iceberg
1.11. We have REST Catalog work in progress, and I noticed the
credential vending PR is still pending. I recently tried setting
things up myself, and without credential vending support (and
especially when using Apache Ozone), it was quite painful and required
several workarounds. I'm not sure how strongly users will push for the
current state of those features.

I completely agree that we have a healthy collection of bug fixes,
improvements, and maintenance work. The question I keep coming back to
is whether that justifies a 4.3.0 release, or whether the scope is
more aligned with a 4.2.1 release.

When I started this thread, I had intended to drive some larger
initiatives, particularly around Hadoop and JDK upgrades.
Unfortunately, life happened, and I have not been able to dedicate the
bandwidth I expected. Some of the blockers are not even entirely clear
to me at this point, and I don't see myself actively pursuing them in
the near future. It would have been good to get those in. I am also
noticing that many projects have already moved, or are moving, to
Hadoop 3.5.0 while we have not. At some point, compatibility concerns
may start surfacing, especially as people begin experimenting with
newer JDKs such as JDK 25.

I was also hoping we would have upstream performance benchmarks in
place starting with this release, as Attila had suggested, providing a
baseline against which future releases could be validated. That effort
does not seem particularly close either, and rushing it would likely
do more harm than good.

That said, I am perfectly fine with cutting an RC soon and will
happily fulfill my responsibilities in validating and voting on the
release. We should, however, make sure that the issues raised during
the previous release vote have been addressed and do not reappear this
time.

What I am genuinely curious about is what users or contributors are
most looking forward to seeing officially published in this release.
If the answer is primarily bug fixes and incremental improvements,
that's fine, but it is worth asking whether we are releasing because
there is meaningful value to deliver or simply because it is time for
another release. I am certainly not looking at every corner of the
codebase, so there may be substantial work that I am unaware of which,
by itself, justifies a new major release. I am mainly trying to
understand the rationale.

Ultimately, I am comfortable with whatever direction the community
feels is best. However, I do want to highlight that our overall
developer bandwidth is quite limited. We have a small core team, and
not all active contributors are able—or interested—in taking on
release management activities. Because of this, I strongly suggest we
keep our release timelines generous. Pushing for realistic schedules
will protect the well-being of our developers and ensure the long-term
health of the project.

-Ayush

On Wed, 17 Jun 2026 at 13:48, Stamatis Zampetakis <[email protected]> wrote:
>
> Hello,
>
> Every user or developer has a different definition of what constitutes a 
> blocker. For me blockers fall into one of the two broad categories:
> i)  regressions (wrong results, performance degradation, crashes & errors, 
> etc.)
> ii) security issues
>
> Based on my categorization HIVE-29543 seems more like a nice to have rather 
> than a release blocker. If many users expect this, we can definitely postpone 
> the release cut. However, if that's the case they should comment on this 
> thread because we have no way of knowing otherwise.
>
> Other than that, it doesn't make much sense to cut the release branch now if 
> we are aiming to wait another month or so for certain features to be 
> included. From the moment the branch is cut we should immediately kick off 
> the release process.
>
> The release manager has the final say on what must be part of the release, so 
> I am fine deferring the decision to Sai. The release schedule must also fit 
> the release manager's personal schedule.
>
> Best,
> Stamatis
>
>
> On Tue, Jun 16, 2026 at 1:31 AM Sai Hemanth Gantasala 
> <[email protected]> wrote:
>>
>> Hi everyone,
>>
>> I’d like to get some feedback regarding the branch cutoff date for the 4.3.0 
>> release, which we initially aimed for mid-June.
>>
>> We currently have a potential blocker with HIVE-29543 (upgrade to Hadoop 
>> 3.5). This is currently blocked by the pending Tez and Atlas releases. 
>> Kokila is working on this and would like to wait for the Tez release, though 
>> the timeline for the Atlas release remains uncertain.
>>
>> Given these dependencies, I would like to hear the team's thoughts on how we 
>> should proceed with the branch cutoff. Should we hold for these upgrades or 
>> move forward with the current plan?
>>
>> Regards,
>> Sai.
>>
>>
>> On Thu, May 21, 2026 at 12:00 PM Sai Hemanth Gantasala 
>> <[email protected]> wrote:
>>>
>>> Hi Stamatis,
>>>
>>> I'm thinking of mid-June for the branch cutoff and aiming for the release 
>>> by the end of June.
>>>
>>> However, I'd like to open this timeline up to other contributors, 
>>> especially external ones, to comment. Please let us know if there are any 
>>> important patches or blockers that need to be accommodated before we 
>>> finalize these dates.
>>>
>>> Thanks,
>>> Sai.
>>>
>>>
>>> On Wed, May 20, 2026 at 12:38 AM Stamatis Zampetakis <[email protected]> 
>>> wrote:
>>>>
>>>> Hey Sai,
>>>>
>>>> Thanks for volunteering to be the release manager for 4.3.0! Our last
>>>> release was six months ago so it's good to get another one out soon.
>>>> What timeline do you have in mind?
>>>>
>>>> Best,
>>>> Stamatis
>>>>
>>>> On Wed, May 20, 2026 at 2:13 AM Sai Hemanth Gantasala
>>>> <[email protected]> wrote:
>>>> >
>>>> > Hello Team,
>>>> >
>>>> > I would like to add https://issues.apache.org/jira/browse/HIVE-29622 (a 
>>>> > recently reported CVE) to the 4.3.0 release items.
>>>> > Also, I’d like to volunteer as the release manager for this release and 
>>>> > start working on the release process.
>>>> >
>>>> > Best Regards,
>>>> > Sai.
>>>> >
>>>> > On Fri, Apr 3, 2026 at 1:33 AM Ayush Saxena <[email protected]> wrote:
>>>> >>
>>>> >> Hadoop-3.5.0 has released [1], so we need to update in hive, so that
>>>> >> ain't a blocker from Hadoop side now. I have created a ticket to track
>>>> >> the upgrade [2]
>>>> >>
>>>> >> -Ayush
>>>> >>
>>>> >> [1] https://lists.apache.org/thread/7dtnbdqrgt30oszd1w1vo7k68z0n7r4b
>>>> >> [2] https://issues.apache.org/jira/browse/HIVE-29543
>>>> >>
>>>> >> On Mon, 16 Mar 2026 at 04:32, Ayush Saxena <[email protected]> wrote:
>>>> >> >
>>>> >> > Thanx folks for the pointers on the performance testing. Let me
>>>> >> > discuss this internally and come back with something more concrete.
>>>> >> > One idea that comes to mind is that we are currently using LFS in our
>>>> >> > Docker images; instead, we could potentially use Apache Ozone there.
>>>> >> > They also publish Docker images, so we might be able to leverage
>>>> >> > those.
>>>> >> >
>>>> >> > -Ayush
>>>> >> >
>>>> >> > On Tue, 10 Mar 2026 at 12:31, László Bodor 
>>>> >> > <[email protected]> wrote:
>>>> >> > >
>>>> >> > > Regarding performance benchmarking, we should have a way to test 
>>>> >> > > the actual upstream code. While many - or all - Hive distributors 
>>>> >> > > have their own ways of doing this, we as an open-source community 
>>>> >> > > don't. The main limitation is the testing setup, because our 
>>>> >> > > current single-image (HS2) or HS2+HMS Docker setup is not suitable 
>>>> >> > > for this purpose, even though it works wonderfully for quick local 
>>>> >> > > testing.
>>>> >> > > That's what's currently being addressed in the scope of 
>>>> >> > > https://issues.apache.org/jira/browse/HIVE-29492.
>>>> >> > >
>>>> >> > > Regards,
>>>> >> > > Laszlo Bodor
>>>> >> > >
>>>> >> > >
>>>> >> > > On Tue, 10 Mar 2026 at 07:38, kokila narayanan 
>>>> >> > > <[email protected]> wrote:
>>>> >> > >>
>>>> >> > >>
>>>> >> > >> Regarding the performance tracking initiative and Hive-Iceberg 
>>>> >> > >> workloads, one possible starting point could be leveraging the 1 
>>>> >> > >> Trillion Row Challenge (1TRC) style benchmarks.
>>>> >> > >>
>>>> >> > >> The Impala community has already experimented with something along 
>>>> >> > >> these lines and they have even extended it to work with Iceberg 
>>>> >> > >> tables as well:
>>>> >> > >> https://github.com/boroknagyz/impala-1trc
>>>> >> > >>
>>>> >> > >> The main query is relatively simple aggregation query:
>>>> >> > >>
>>>> >> > >> SELECT station, min(measure), max(measure), avg(measure)
>>>> >> > >> FROM measurements_1trc
>>>> >> > >> GROUP BY station
>>>> >> > >> ORDER BY station;
>>>> >> > >>
>>>> >> > >> While this benchmark is quite simple and only tests a single type 
>>>> >> > >> of query, it could still be a good starting point. It does not 
>>>> >> > >> cover the wider variety of queries we usually see in Hive 
>>>> >> > >> workloads (like joins, filters, or more complex aggregations), but 
>>>> >> > >> it is easy to reproduce and run.
>>>> >> > >>
>>>> >> > >> With this setup, it could help us get an initial idea of how Hive 
>>>> >> > >> performs on very large Iceberg tables for large-scale scan and 
>>>> >> > >> aggregation workloads.
>>>> >> > >>
>>>> >> > >> I have experimented with this dataset for another feature so I can 
>>>> >> > >> also try running 1BRC/1TRC on Hive and share some initial numbers 
>>>> >> > >> if that would be useful for the release planning.
>>>> >> > >>
>>>> >> > >> Thanks,
>>>> >> > >>
>>>> >> > >> Kokila
>>>> >> > >>
>>>> >> > >>
>>>> >> > >> On Tue, Mar 10, 2026 at 11:43 AM Ayush Saxena <[email protected]> 
>>>> >> > >> wrote:
>>>> >> > >>>
>>>> >> > >>> Hadoop 3.5.0 is currently in the RC stage (RC0 is already 
>>>> >> > >>> available). I think we can reasonably wait for the final 3.5.0 
>>>> >> > >>> release, and if time and luck favor us, we could even try giving 
>>>> >> > >>> JDK 25 a shot as well. From a timeline perspective, I don’t think 
>>>> >> > >>> we are too late yet.
>>>> >> > >>>
>>>> >> > >>> More broadly, my expectation—or perhaps wish—for the upcoming 
>>>> >> > >>> release would be to include Hadoop 3.5 + Iceberg V3 + JDK 25 + 
>>>> >> > >>> REST Catalog related changes. Having these in the release would 
>>>> >> > >>> make it more compelling for users to upgrade, rather than it 
>>>> >> > >>> feeling like just another bug-fix release that gives the 
>>>> >> > >>> impression we are in KTLO mode. :-)
>>>> >> > >>>
>>>> >> > >>> As Attila also mentioned above regarding performance tracking, I 
>>>> >> > >>> would definitely like to push that initiative as part of this 
>>>> >> > >>> release. We may not have something perfect right away, but at 
>>>> >> > >>> least we should have a starting point. At the moment, we 
>>>> >> > >>> essentially have nothing in this area. We can always refine the 
>>>> >> > >>> strategy and improve the benchmarks in future releases, but it 
>>>> >> > >>> would be good to have something tangible that we can showcase.
>>>> >> > >>> Personally, I am inclined towards experimenting around 
>>>> >> > >>> Hive–Iceberg workloads, gathering numbers for specific use cases 
>>>> >> > >>> or queries, and drawing some comparisons.
>>>> >> > >>>
>>>> >> > >>> If anyone has already worked on something similar, or has ideas 
>>>> >> > >>> or proposals for how we could approach this, please do share.
>>>> >> > >>>
>>>> >> > >>> -Ayush
>>>> >> > >>>
>>>> >> > >>> On Mon, 9 Feb 2026 at 14:13, Shohei Okumiya <[email protected]> 
>>>> >> > >>> wrote:
>>>> >> > >>>>
>>>> >> > >>>> Hi,
>>>> >> > >>>>
>>>> >> > >>>> I'm curious about the remaining blockers. From my perspective,
>>>> >> > >>>> HIVE-29445 and HIVE-29415 might be needed if we include Iceberg 
>>>> >> > >>>> v3. I
>>>> >> > >>>> think it's possible to put it off until 4.4. HIVE-29415 requires
>>>> >> > >>>> Iceberg 1.10.2 or 1.11.0 if I understand correctly.
>>>> >> > >>>>
>>>> >> > >>>> Hadoop 3.5 is nice, but it hasn't been released yet. Most 
>>>> >> > >>>> likely, we
>>>> >> > >>>> need to keep using 3.4 for a while.
>>>> >> > >>>>
>>>> >> > >>>> If we release 4.3 now, I think we should upgrade the Iceberg 
>>>> >> > >>>> library
>>>> >> > >>>> from 1.10.0 to 1.10.1, which has some bug fixes and is not a big
>>>> >> > >>>> effort.
>>>> >> > >>>>
>>>> >> > >>>> Regards,
>>>> >> > >>>> Okumin
>>>> >> > >>>>
>>>> >> > >>>> On Thu, Jan 22, 2026 at 7:44 PM László Bodor 
>>>> >> > >>>> <[email protected]> wrote:
>>>> >> > >>>> >
>>>> >> > >>>> > As to:
>>>> >> > >>>> >
>>>> >> > >>>> > #4 Hadoop 3.5 support would be great. Do we plan to include a 
>>>> >> > >>>> > newer Tez version in 4.5? From what I can see, a significant 
>>>> >> > >>>> > number of changes have recently landed in the repository.
>>>> >> > >>>> >
>>>> >> > >>>> > I don’t think Tez will reach 1.0.0 before Hive 4.5. Given the 
>>>> >> > >>>> > major version milestone, we’re aiming to push more changes and 
>>>> >> > >>>> > are less afraid of breaking things. So unless there’s 
>>>> >> > >>>> > something blocking, I believe Hive 4.5 can continue to use Tez 
>>>> >> > >>>> > 0.10.5. My personal expectation for Tez 1.0.0 is "sometime 
>>>> >> > >>>> > later this year".
>>>> >> > >>>> >
>>>> >> > >>>> >
>>>> >> > >>>> > On Tue, 20 Jan 2026 at 15:45, Ayush Saxena 
>>>> >> > >>>> > <[email protected]> wrote:
>>>> >> > >>>> >>
>>>> >> > >>>> >> Hi Attila,
>>>> >> > >>>> >> Regarding:
>>>> >> > >>>> >>
>>>> >> > >>>> >>> As you mentioned, Iceberg v3 is a major part of this 
>>>> >> > >>>> >>> release. I fully agree, and I think we should clearly 
>>>> >> > >>>> >>> highlight that Hive is one of the core engines supporting 
>>>> >> > >>>> >>> Iceberg v3. Potentially even earlier than Trino or other 
>>>> >> > >>>> >>> competitors. One thing I would like to put attention to 
>>>> >> > >>>> >>> (coming from discussions with the Apache Impala team) is 
>>>> >> > >>>> >>> that the Vector Delete spec seems to have changed, with 
>>>> >> > >>>> >>> row-lineage becoming a prerequisite. As far as I remember, 
>>>> >> > >>>> >>> this is not yet implemented in Hive. If we want Hive to 
>>>> >> > >>>> >>> officially support Iceberg v3 with vector deletes, we should 
>>>> >> > >>>> >>> verify and address this gap. 
>>>> >> > >>>> >>> https://iceberg.apache.org/spec/#row-lineage
>>>> >> > >>>> >>
>>>> >> > >>>> >>
>>>> >> > >>>> >> -----
>>>> >> > >>>> >> I’m not entirely sure what the issue is on the Impala side. 
>>>> >> > >>>> >> Iceberg V3 writes and Deletion Vectors are working correctly 
>>>> >> > >>>> >> in Hive, even with the latest Iceberg version. As far as I 
>>>> >> > >>>> >> know, Iceberg V3 does not allow committing a snapshot unless 
>>>> >> > >>>> >> row IDs are populated. We also have tests in place that cover 
>>>> >> > >>>> >> writes and deletes for Iceberg V3.
>>>> >> > >>>> >>
>>>> >> > >>>> >> We don’t have anything explicit for row lineage because Hive 
>>>> >> > >>>> >> relies on Iceberg writers; we haven’t implemented custom 
>>>> >> > >>>> >> writers. As a result, the Iceberg layer is responsible for 
>>>> >> > >>>> >> populating the row IDs and the next row ID, and that seems to 
>>>> >> > >>>> >> be working as expected.
>>>> >> > >>>> >>
>>>> >> > >>>> >> I tested this locally and verified the metadata files, which 
>>>> >> > >>>> >> clearly contain the row IDs. I’m attaching screenshots of the 
>>>> >> > >>>> >> metadata for reference.
>>>> >> > >>>> >>
>>>> >> > >>>> >> If Impala is observing unexpected behavior and there turns 
>>>> >> > >>>> >> out to be an issue with our implementation, they can report 
>>>> >> > >>>> >> it via a ticket. However, from a fundamentals point of view, 
>>>> >> > >>>> >> this looks correct on the Hive/Iceberg side.
>>>> >> > >>>> >>
>>>> >> > >>>> >> -Ayush
>>>> >> > >>>> >>
>>>> >> > >>>> >>
>>>> >> > >>>> >> On Tue, 20 Jan 2026 at 19:24, Denys Kuzmenko 
>>>> >> > >>>> >> <[email protected]> wrote:
>>>> >> > >>>> >>>
>>>> >> > >>>> >>> Hi everyone,
>>>> >> > >>>> >>>
>>>> >> > >>>> >>> +1 on collecting the performance numbers.
>>>> >> > >>>> >>>
>>>> >> > >>>> >>> I’d like to propose a few additional items to consider:
>>>> >> > >>>> >>>
>>>> >> > >>>> >>> #1 REST Catalog HA and vended credentials support
>>>> >> > >>>> >>> - HIVE-29391,
>>>> >> > >>>> >>> - HIVE-29228
>>>> >> > >>>> >>>
>>>> >> > >>>> >>> #2 Federated Catalog support
>>>> >> > >>>> >>> - HIVE-28879
>>>> >> > >>>> >>>
>>>> >> > >>>> >>> #3 Kubernetes manifests / Helm chart for Apache Hive 
>>>> >> > >>>> >>> deployment
>>>> >> > >>>> >>>
>>>> >> > >>>> >>> #4 New V3 items (that I am aware of)
>>>> >> > >>>> >>>
>>>> >> > >>>> >>> 1. VARIANT shredding:
>>>> >> > >>>> >>>   - HIVE-29287,
>>>> >> > >>>> >>>   - HIVE-29354
>>>> >> > >>>> >>>
>>>> >> > >>>> >>> 2. Z-order support for Iceberg tables:
>>>> >> > >>>> >>>   - HIVE-29132
>>>> >> > >>>> >>>
>>>> >> > >>>> >>> Best regards,
>>>> >> > >>>> >>> Denys

Re: [DISCUSS] Next Release (4.3.0)

Reply via email to