Thanks to everyone who helped to create the Hive 4.0.0-alpha-1 release!
I really hope this helps our users to try out our previously unreleased new
features.

As a last step of the release process, I will update the versions for the
next release.
I would like to ask your opinion about the next version.

Which version should we use for the development:
- 4.0.0-SNAPSHOT
- 4.0.0-alpha-2-SNAPSHOT

Thanks,
Peter

On Mon, 21 Mar 2022 at 15:59, Peter Vary <pv...@cloudera.com> wrote:

> Hi Team,
>
> If everyone agrees, tomorrow I would like to start the  release process
> for 4.0.0-alpha-1.
>
> Is there any outstanding blocker jira that you know of?
>
> Thanks,
> Peter
>
>
> > On 2022. Mar 9., at 17:01, Stamatis Zampetakis <zabe...@gmail.com>
> wrote:
> >
> > I just logged HIVE-26022 [1] which seems to be another potential blocker
> > for 4.0.0-alpha-1.
> >
> > Best,
> > Stamatis
> >
> > [1] https://issues.apache.org/jira/browse/HIVE-26022
> >
> > On Thu, Mar 3, 2022 at 3:54 PM Peter Vary <pv...@cloudera.com> wrote:
> >
> >> Hi Team,
> >>
> >> Here is our status:
> >> We collected the blocker tickets and marked them with fixVersion
> >> 4.0.0-alpha-1:
> >>
> >>
> https://issues.apache.org/jira/issues/?filter=-1&jql=project%20%3D%20HIVE%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%204.0.0-alpha-1
> >> <
> https://issues.apache.org/jira/issues/?filter=-1&jql=project%20=%20HIVE%20AND%20resolution%20=%20Unresolved%20AND%20fixVersion%20=%204.0.0-alpha-1
> >
> >>
> >>   - HIVE-26002 - Create db scripts for 4.0.0-alpha-1
> >>   - HIVE-25994 - Analyze table runs into ClassNotFoundException-s in
> >>   case binary distribution is used
> >>   - HIVE-25935 - Cleanup IMetaStoreClient#getPartitionsByNames APIs
> >>
> >> Please create a jira and mark it with fixVersion 4.0.0-alpha-1, if you
> >> happen to know of any other blockers.
> >>
> >> We plan to fix these jiras, and then release the following artifacts
> >> together:
> >>
> >>   - Storage API - 4.0.0-alpha-1
> >>   - Standalone Metastore - 4.0.0-alpha-1
> >>   - Hive - 4.0.0-alpha-1
> >>
> >>
> >> Thanks,
> >> Peter
> >>
> >>
> >> On 2022. Mar 2., at 11:50, Peter Vary <pv...@cloudera.com> wrote:
> >>
> >> Will continue this discussion on the #hive ASF slack. If you are
> >> interested, please join.
> >> We will do updates here time-to-time, so the ones who are not using
> slack
> >> can participate that way.
> >>
> >> On 2022. Mar 2., at 11:11, Peter Vary <pv...@cloudera.com> wrote:
> >>
> >> Good idea Zoltan, joined the channel.
> >> I would like to scope reasonably small, so I agree with focusing on
> >> 4.0.0-alpha-1
> >>
> >> On 2022. Mar 2., at 11:01, Zoltan Haindrich <k...@rxd.hu> wrote:
> >>
> >> Hey,
> >>
> >> regarding 4.0.0 / 4.0.0-alpha-1 target/fix versions in the jira:
> >> * I think we should change all already resolved tickets with fix version
> >> 4.0.0 to have fix version 4.0.0-alpha-1
> >> ** this could be postponed until we are actually releasing the thing as
> I
> >> think everyone committing to the master is entering 4.0.0 as fix version
> >> without much aftertought...this could probably change after we get the
> >> first release out.
> >> * regarding the the existing tickets with fix version/target version
> 4.0.0
> >> - I think that would be a bit too much (>200 tickets)
> >> ** some numbers:
> >> *** 239 tickets open now
> >> *** 224 was not updated in the last 90 days
> >> *** 216 was not changed in the last 180 days
> >> *** 178 was not updated in the last 360 days
> >> ** as a matter of fact I think many of these tickets shouldn't even
> have a
> >> target or fix version - and most of them should be unassigned...I don't
> >> want to get lost in this right now...I think for now we should keep the
> >> scope small and only care with 4.0.0-alpha-1 tickets
> >>
> >> https://issues.apache.org/jira/issues/?
> >>
> jql=project%20%3D%20hive%20and%20resolutiondate%20%20is%20empty%20and%20(fixVersion%20%20in%20(%274.0.0%27)%20or%20cf%5B12310320%5D%20%20in%20(%274.0.0%27))
> >>
> >> I think for faster communication regarding these things we could also
> >> utilize the #hive channel on the ASF slack - what do you guys think?
> >>
> >> cheers,
> >> Zoltan
> >>
> >> On 3/2/22 9:51 AM, Stamatis Zampetakis wrote:
> >>
> >> Agree with Peter, creating JIRAs is the way to go.
> >> Putting the appropriate priority (e.g., BLOCKER) and version (4.0.0 or
> >> 4.0.0-alpha-1) when creating the JIRA should be enough to keep us on
> track.
> >> I am mentioning both 4.0.0 and 4.0.0-alpha-1 because eventually I think
> we
> >> are gonna move everything with target 4.0.0 to 4.0.0-alpha-1.
> >> On Wed, Mar 2, 2022 at 9:37 AM Peter Vary <pv...@cloudera.com.invalid>
> >> wrote:
> >>
> >> Hi Team,
> >>
> >> Could we create tickets for the issues?
> >> I think it would be good to collect the issues/potential blockers in the
> >> jira instead of having a complicated mail thread.
> >>
> >> If we set the target version to 4.0.0-alpha-1, then we can easily use
> the
> >> following filter to see the status of the tasks:
> >>
> >>
> >>
> https://issues.apache.org/jira/issues/?jql=project%3D%22HIVE%22%20AND%20%22Target%20Version%2Fs%22%3D%224.0.0-alpha-1%22
> >> <
> >>
> >>
> https://issues.apache.org/jira/issues/?jql=project=%22HIVE%22%20AND%20%22Target%20Version/s%22=%224.0.0-alpha-1%22
> >>
> >>
> >>
> >> @Stamatis: Sadly I have missed your letter/jira and created my own with
> >> the fix for building from the src package:
> >> https://issues.apache.org/jira/browse/HIVE-25997 <
> >> https://issues.apache.org/jira/browse/HIVE-25997>
> >> If you have time, I would like to ask you to review.
> >>
> >> If anyone knows of any blocker I would like to ask them to create a jira
> >> for that too.
> >>
> >> Thanks,
> >> Peter
> >>
> >>
> >> On 2022. Mar 2., at 7:04, Sungwoo Park <c...@pl.postech.ac.kr> wrote:
> >>
> >> Hello Alessandro,
> >>
> >> For the latest commit, loading ORC tables fails (with the log message
> >>
> >> shown below). Let me try to find a commit that introduces this bug and
> >> create a JIRA ticket.
> >>
> >>
> >> --- Sungwoo
> >>
> >> 2022-03-02 05:41:56,578 ERROR [Thread-73] exec.StatsTask: Failed to run
> >>
> >> stats task
> >>
> >> java.io.IOException: org.apache.hadoop.mapred.InvalidInputException:
> >>
> >> Input path does not exist:
> >>
> >>
> hdfs://blue0:8020/tmp/hive/gitlab-runner/a236e1b4-b354-4343-b900-3d71b1bc7504/hive_2022-03-02_05-40-50_966_446574755576325031-1/-mr-10000/.hive-staging_hive_2022-03-02_05-40-50_966_446574755576325031-1/-ext-10001
> >>
> >> at
> >>
> >>
> >>
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:622)
> >>
> >> at
> >>
> >>
> >>
> org.apache.hadoop.hive.ql.stats.ColStatsProcessor.constructColumnStatsFromPackedRows(ColStatsProcessor.java:105)
> >>
> >> at
> >>
> >>
> >>
> org.apache.hadoop.hive.ql.stats.ColStatsProcessor.persistColumnStats(ColStatsProcessor.java:200)
> >>
> >> at
> >>
> >>
> >>
> org.apache.hadoop.hive.ql.stats.ColStatsProcessor.process(ColStatsProcessor.java:93)
> >>
> >> at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:107)
> >> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
> >> at
> >>
> >>
> >>
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> >>
> >> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:83)
> >> Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path
> >>
> >> does not exist:
> >>
> >>
> hdfs://blue0:8020/tmp/hive/gitlab-runner/a236e1b4-b354-4343-b900-3d71b1bc7504/hive_2022-03-02_05-40-50_966_446574755576325031-1/-mr-10000/.hive-staging_hive_2022-03-02_05-40-50_966_446574755576325031-1/-ext-10001
> >>
> >> at
> >>
> >>
> >>
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:294)
> >>
> >> at
> >>
> >>
> >>
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:236)
> >>
> >> at
> >>
> >>
> >>
> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45)
> >>
> >> at
> >>
> >>
> >>
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:322)
> >>
> >> at
> >>
> >>
> >>
> org.apache.hadoop.hive.ql.exec.FetchOperator.generateWrappedSplits(FetchOperator.java:435)
> >>
> >> at
> >>
> >>
> >>
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:402)
> >>
> >> at
> >>
> >>
> >>
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:306)
> >>
> >> at
> >>
> >>
> >>
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:560)
> >>
> >> ... 7 more
> >>
> >> On Tue, 1 Mar 2022, Alessandro Solimando wrote:
> >>
> >> Hi Sungwoo,
> >> last time I tried to run TPCDS-based benchmark I stumbled upon a similar
> >> situation, finally I found that statistics were not computed, so CBO was
> >> not kicking in, and the automatic retry goes with CBO off which was
> >>
> >> failing
> >>
> >> for something like 10 queries (subqueries cannot be decorrelated, but
> >>
> >> also
> >>
> >> some runtime errors).
> >>
> >> Making sure that (column) statistics were correctly computed fixed the
> >> problem.
> >>
> >> Can you check if this is the case for you?
> >>
> >> HTH,
> >> Alessandro
> >>
> >> On Tue, 1 Mar 2022 at 15:28, POSTECH CT <c...@pl.postech.ac.kr> wrote:
> >>
> >> Hello Hive team,
> >>
> >> I wonder if anyone in the Hive team has tried the TPC-DS benchmark on
> >> the master branch recently.  We occasionally run TPC-DS system tests
> >> using the master branch, and the tests don't succeed completely. Here
> >> is how our TPC-DS tests proceed.
> >>
> >> 1. Compile and run Hive on Tez (not Hive-LLAP)
> >> 2. Load ORC tables from 1TB TPC-DS raw text data, and compute
> >>
> >> statistics
> >>
> >> 3. Run 99 TPC-DS queries which were slightly modified to return
> >> varying number of rows (rather than 100 rows)
> >> 4. Compare the results against the previous results
> >>
> >> The previous results were obtained and cross-checked by running Hive
> >> 3.1.2 and SparkSQL 2.3/3.2, so we are faily confident about their
> >> correctness.
> >>
> >> For the latest commit in the master branch, step 2 fails. For earlier
> >> commits (for example, commits in February 2021), step 3 fails where
> >> several queries either fail or return wrong results.
> >>
> >> We can compile and report the test results in this mailing list, but
> >> would like to know if similar results have been reproduced by the Hive
> >> team, in order to make sure that we did not make errors in our tests.
> >>
> >> If it is okay to open a JIRA ticket that only reports failures in the
> >> TPC-DS test, we could also perform git bi-sect to locate the commit
> >> that begin to generate wrong results.
> >>
> >> --- Sungwoo Park
> >>
> >> On Tue, 1 Mar 2022, Zoltan Haindrich wrote:
> >>
> >> Hey,
> >>
> >> Great to hear that we are on the same side regarding these things :)
> >>
> >> For around a week now - we have nightly builds for the master branch:
> >> http://ci.hive.apache.org/job/hive-nightly/12/
> >>
> >> I think we have 1 blocker issue:
> >> https://issues.apache.org/jira/browse/HIVE-25665
> >>
> >> I know about one more thing I would rather get fixed before we release
> >>
> >> it:
> >>
> >> https://issues.apache.org/jira/browse/HIVE-25994
> >> The best would be to introduce smoke tests (HIVE-22302) to ensure that
> >> something like this will not happen in the future - but we should
> >>
> >> probably
> >>
> >> start moving forward.
> >>
> >> I think we could call the first iteration of this as "4.0.0-alpha-1"
> >>
> >> :)
> >>
> >>
> >> I've added 4.0.0-alpha-1 as a version - and added the above two ticket
> >>
> >> to it.
> >>
> >>
> >>
> >>
> >>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20HIVE%20AND%20fixVersion%20%3D%204.0.0-alpha-1
> >>
> >>
> >> Are there any more things you guys know which would be needed?
> >>
> >> cheers,
> >> Zoltan
> >>
> >>
> >> On 2/22/22 12:18 PM, Peter Vary wrote:
> >>
> >> I would vote for 4.0.0-alpha-1 or similar for all of the components.
> >>
> >> When we have more stable releases I would keep the 4.x.x schema,
> >>
> >> since
> >>
> >> everyone is familiar with it, and I do not see a really good reason
> >>
> >> to
> >>
> >> change it.
> >>
> >> Thanks,
> >> Peter
> >>
> >>
> >> On 2022. Feb 10., at 3:34, Szehon Ho <szehon.apa...@gmail.com>
> >>
> >> wrote:
> >>
> >>
> >> +1 that would be awesome to see Hive master released after so long.
> >>
> >> Either 4.0 or 4.0.0-alpha-1 makes sense to me, not sure how we would
> >>
> >> pick
> >>
> >> any 3.x or calendar date (which could tend to slip and be more
> >> confusing?).
> >>
> >> Thanks in any case to get the ball rolling.
> >> Szehon
> >>
> >> On Wed, Feb 9, 2022 at 4:55 AM Zoltan Haindrich <k...@rxd.hu>
> >>
> >> wrote:
> >>
> >>
> >> Hey,
> >>
> >> Thank you guys for chiming in; versioning is for sure something we
> >>
> >> should
> >>
> >> get to some common ground.
> >> Its a triple problem right now; I think we have the following
> >>
> >> things:
> >>
> >> * storage-api
> >> ** we have "2.7.3-SNAPSHOT" in the repo
> >> ***
> >>
> >>
> >>
> >>
> https://github.com/apache/hive/blob/0d1cffffc7c5005fe47759298fb35a1c67edc93f/storage-api/pom.xml#L27
> >>
> >> ** meanwhile we already have 2.8.1 released to maven central
> >> ***
> >>
> >> https://mvnrepository.com/artifact/org.apache.hive/hive-storage-api
> >>
> >> * standalone-metastore
> >> ** 4.0.0-SNAPSHOT in the repo
> >> ** last release is 3.1.2
> >> * hive
> >> ** 4.0.0-SNAPSHOT in the repo
> >> ** last release is 3.1.2
> >>
> >> Regarding the actual version number I'm not entirely sure where we
> >>
> >> should
> >>
> >> start the numbering - that's why I was referring to it as Hive-X
> >>
> >> in my
> >>
> >> first letter.
> >>
> >> I think the key point here would be to start shipping releases
> >>
> >> regularily
> >>
> >> and not the actual version number we will use - I'll kinda open to
> >>
> >> any
> >>
> >> versioning scheme which
> >> reflects that this is a newer release than 3.1.2.
> >>
> >> I could imagine the following ones:
> >> (A) start with something less expected; but keep 3 in the prefix to
> >> reflect that this is not yet 4.0
> >>  I can imagine the following numbers:
> >>  3.900.0, 3.901.0, ...
> >>  3.9.0, 3.9.1, ...
> >> (B) start 4.0.0
> >>  4.0.0, 4.1.0, ...
> >> (C) jump to some calendar based version number like 2022.2.9
> >>  trunk based development has pros and cons...making a move like
> >>
> >> this
> >>
> >> irreversibly pledges trunk based development; and makes release
> >>
> >> branches
> >>
> >> hard to introduce
> >> (X) somewhat orthogonal is to (also) use some suffixes
> >>  4.0.0-alpha1, 4.0.0-alpha2, 4.0.0-beta1
> >>  this is probably the most tempting to use - but this versioning
> >> schema with a non-changing MINOR and PATCH number will
> >>  also suggest that the actual software is fully compatible - and
> >>
> >> only
> >>
> >> bugs are being fixed - which will not be true...
> >>
> >> I really like the idea to suffix these releases with alpha or beta
> >>
> >> -
> >>
> >> which
> >> will communicate our level commitment that these are not 100%
> >>
> >> production
> >>
> >> ready artifacts.
> >>
> >> I think we could fix HIVE-25665; and probably experiment with
> >> 4.0.0-alpha1
> >> for start...
> >>
> >> This also means there should *not* be a branch-4 after releasing
> >>
> >> Hive
> >>
> >> 4.0
> >>
> >> and let that diverge (and becomes the next, super-ignored
> >>
> >> branch-3),
> >>
> >> correct; no need to keep a branch we don't maintain...but in any
> >>
> >> case
> >>
> >> I
> >>
> >> think we can postpone this decision until there will be something
> >>
> >> to
> >>
> >> release... :)
> >>
> >> cheers,
> >> Zoltan
> >>
> >>
> >>
> >> On 2/9/22 10:23 AM, L?szl? Bodor wrote:
> >>
> >> Hi All!
> >>
> >> A purely technical question: what will the SNAPSHOT version become
> >>
> >> after
> >>
> >> releasing Hive 4.0.0? I think this is important, as it defines and
> >>
> >> reflects
> >>
> >> the future release plans.
> >>
> >> Currently, it's 4.0.0-SNAPSHOT, I guess it's since Hive 3.0 +
> >>
> >> branch-3.
> >>
> >> Hive is an evolving and super-active project: if we want to make
> >>
> >> regular
> >>
> >> releases, we should simply release Hive 4.0 and bump pom to
> >>
> >> 4.1.0-SNAPSHOT,
> >>
> >> which clearly says that we can release Hive 4.1 anytime we want,
> >>
> >> without
> >>
> >> being frustrated about "whether we included enough cool stuff to
> >>
> >> release
> >>
> >> 5.0".
> >>
> >> This also means there should *not* be a branch-4 after releasing
> >>
> >> Hive
> >>
> >> 4.0
> >> and let that diverge (and becomes the next, super-ignored
> >>
> >> branch-3),
> >>
> >> only
> >> when we end up bringing a minor backward-incompatible thing that
> >>
> >> needs a
> >>
> >> 4.0.x, and when it happens, we'll create *branch-4.0 *on demand.
> >>
> >> For
> >>
> >> me,
> >>
> >> a
> >>
> >> branch called *branch-4.0* doesn't imply either I can expect cool
> >>
> >> releases
> >>
> >> in the future from there or the branch is maintained and tries to
> >>
> >> be
> >>
> >> in
> >>
> >> sync with the *master*.
> >>
> >> Regards,
> >> Laszlo Bodor
> >>
> >> Alessandro Solimando <alessandro.solima...@gmail.com> ezt ?rta
> >>
> >> (id?pont:
> >>
> >> 2022. febr. 8., K, 16:42):
> >>
> >> Hello everyone,
> >> thank you for starting this discussion.
> >>
> >> I agree that releasing the master branch regularly and
> >>
> >> sufficiently
> >>
> >> often
> >>
> >> is welcome and vital for the health of the community.
> >>
> >> It would be great to hear from others too, especially PMC members
> >>
> >> and
> >>
> >> committers, but even simple contributors/followers as myself.
> >>
> >> Best regards,
> >> Alessandro
> >>
> >> On Wed, 2 Feb 2022 at 12:22, Stamatis Zampetakis <
> >>
> >> zabe...@gmail.com
> >>
> >>
> >> wrote:
> >>
> >> Hello,
> >>
> >> Thanks for starting the discussion Zoltan.
> >>
> >> I strongly believe that it is important to have regular and
> >>
> >> often
> >>
> >> releases
> >>
> >> otherwise people will create and maintain separate Hive forks.
> >> The latter is not good for the project and the community may
> >>
> >> lose
> >>
> >> valuable
> >>
> >> members because of it.
> >>
> >> Going forward I fully agree that there is no point bringing up
> >>
> >> strong
> >>
> >> blockers for the next release. For sure there are many backward
> >> incompatible changes and possibly unstable features but unless
> >>
> >> we
> >>
> >> get
> >>
> >> a
> >> release out it will be difficult to determine what is broken and
> >>
> >> what
> >>
> >> needs
> >>
> >> to be fixed.
> >>
> >> Due to the big number of changes that are going to appear in the
> >>
> >> next
> >>
> >> version I would suggest using the terms Hive X-alpha, Hive
> >>
> >> X-beta
> >>
> >> for
> >>
> >> the
> >>
> >> first few releases. This will make it clear to the end users
> >>
> >> that
> >>
> >> they
> >>
> >> need
> >>
> >> to be careful when upgrading from an older version and it will
> >>
> >> give us
> >>
> >> a
> >>
> >> bit more time and freedom to treat issues that the users will
> >>
> >> likely
> >>
> >> discover.
> >>
> >> The only real blocker that we may want to treat is HIVE-25665
> >>
> >> [1]
> >>
> >> but
> >>
> >> we
> >>
> >> can continue the discussion under that ticket and re-evaluate if
> >>
> >> necessary,
> >>
> >>
> >> Best,
> >> Stamatis
> >>
> >> [1] https://issues.apache.org/jira/browse/HIVE-25665
> >>
> >>
> >> On Tue, Feb 1, 2022 at 5:03 PM Zoltan Haindrich <k...@rxd.hu>
> >>
> >> wrote:
> >>
> >>
> >> Hey All,
> >>
> >> We didn't made a release for a long time now; (3.1.2 was
> >>
> >> released
> >>
> >> on
> >>
> >> 26
> >>
> >> August 2019) - and I think because we didn't made that many
> >>
> >> branch-3
> >>
> >> releases; not too many fixes
> >> were ported there - which made that release branch kinda erode
> >>
> >> away.
> >>
> >>
> >> We have a lot of new features/changes in the current master.
> >> I think instead of aiming for big feature-packed releases we
> >>
> >> should
> >>
> >> aim
> >>
> >> for making a regular release every few months - we should make
> >> regular
> >> releases which people could
> >> install and use.
> >> After all releasing Hive after more than 2 years would be big
> >>
> >> step
> >>
> >> forward
> >>
> >> in itself alone - we have so many improvements that I can't
> >>
> >> even
> >>
> >> count...
> >>
> >>
> >> But I may know not every aspects of the project / states of
> >>
> >> some
> >>
> >> internal
> >>
> >> features - so I would like to ask you:
> >> What would be the bare minimum requirements before we could
> >>
> >> release
> >>
> >> the
> >>
> >> current master as Hive X?
> >>
> >> There are many nice-to-have-s like:
> >> * hadoop upgrade
> >> * jdk11
> >> * remove HoS or MR
> >> * ?
> >> but I don't think these are blockers...we can make any of these
> >>
> >> in
> >>
> >> the
> >> next release if we start making them...
> >>
> >> cheers,
> >> Zoltan
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
>
>

Reply via email to