Thanks to everyone who helped to create the Hive 4.0.0-alpha-1 release! I really hope this helps our users to try out our previously unreleased new features.
As a last step of the release process, I will update the versions for the next release. I would like to ask your opinion about the next version. Which version should we use for the development: - 4.0.0-SNAPSHOT - 4.0.0-alpha-2-SNAPSHOT Thanks, Peter On Mon, 21 Mar 2022 at 15:59, Peter Vary <pv...@cloudera.com> wrote: > Hi Team, > > If everyone agrees, tomorrow I would like to start the release process > for 4.0.0-alpha-1. > > Is there any outstanding blocker jira that you know of? > > Thanks, > Peter > > > > On 2022. Mar 9., at 17:01, Stamatis Zampetakis <zabe...@gmail.com> > wrote: > > > > I just logged HIVE-26022 [1] which seems to be another potential blocker > > for 4.0.0-alpha-1. > > > > Best, > > Stamatis > > > > [1] https://issues.apache.org/jira/browse/HIVE-26022 > > > > On Thu, Mar 3, 2022 at 3:54 PM Peter Vary <pv...@cloudera.com> wrote: > > > >> Hi Team, > >> > >> Here is our status: > >> We collected the blocker tickets and marked them with fixVersion > >> 4.0.0-alpha-1: > >> > >> > https://issues.apache.org/jira/issues/?filter=-1&jql=project%20%3D%20HIVE%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%204.0.0-alpha-1 > >> < > https://issues.apache.org/jira/issues/?filter=-1&jql=project%20=%20HIVE%20AND%20resolution%20=%20Unresolved%20AND%20fixVersion%20=%204.0.0-alpha-1 > > > >> > >> - HIVE-26002 - Create db scripts for 4.0.0-alpha-1 > >> - HIVE-25994 - Analyze table runs into ClassNotFoundException-s in > >> case binary distribution is used > >> - HIVE-25935 - Cleanup IMetaStoreClient#getPartitionsByNames APIs > >> > >> Please create a jira and mark it with fixVersion 4.0.0-alpha-1, if you > >> happen to know of any other blockers. > >> > >> We plan to fix these jiras, and then release the following artifacts > >> together: > >> > >> - Storage API - 4.0.0-alpha-1 > >> - Standalone Metastore - 4.0.0-alpha-1 > >> - Hive - 4.0.0-alpha-1 > >> > >> > >> Thanks, > >> Peter > >> > >> > >> On 2022. Mar 2., at 11:50, Peter Vary <pv...@cloudera.com> wrote: > >> > >> Will continue this discussion on the #hive ASF slack. If you are > >> interested, please join. > >> We will do updates here time-to-time, so the ones who are not using > slack > >> can participate that way. > >> > >> On 2022. Mar 2., at 11:11, Peter Vary <pv...@cloudera.com> wrote: > >> > >> Good idea Zoltan, joined the channel. > >> I would like to scope reasonably small, so I agree with focusing on > >> 4.0.0-alpha-1 > >> > >> On 2022. Mar 2., at 11:01, Zoltan Haindrich <k...@rxd.hu> wrote: > >> > >> Hey, > >> > >> regarding 4.0.0 / 4.0.0-alpha-1 target/fix versions in the jira: > >> * I think we should change all already resolved tickets with fix version > >> 4.0.0 to have fix version 4.0.0-alpha-1 > >> ** this could be postponed until we are actually releasing the thing as > I > >> think everyone committing to the master is entering 4.0.0 as fix version > >> without much aftertought...this could probably change after we get the > >> first release out. > >> * regarding the the existing tickets with fix version/target version > 4.0.0 > >> - I think that would be a bit too much (>200 tickets) > >> ** some numbers: > >> *** 239 tickets open now > >> *** 224 was not updated in the last 90 days > >> *** 216 was not changed in the last 180 days > >> *** 178 was not updated in the last 360 days > >> ** as a matter of fact I think many of these tickets shouldn't even > have a > >> target or fix version - and most of them should be unassigned...I don't > >> want to get lost in this right now...I think for now we should keep the > >> scope small and only care with 4.0.0-alpha-1 tickets > >> > >> https://issues.apache.org/jira/issues/? > >> > jql=project%20%3D%20hive%20and%20resolutiondate%20%20is%20empty%20and%20(fixVersion%20%20in%20(%274.0.0%27)%20or%20cf%5B12310320%5D%20%20in%20(%274.0.0%27)) > >> > >> I think for faster communication regarding these things we could also > >> utilize the #hive channel on the ASF slack - what do you guys think? > >> > >> cheers, > >> Zoltan > >> > >> On 3/2/22 9:51 AM, Stamatis Zampetakis wrote: > >> > >> Agree with Peter, creating JIRAs is the way to go. > >> Putting the appropriate priority (e.g., BLOCKER) and version (4.0.0 or > >> 4.0.0-alpha-1) when creating the JIRA should be enough to keep us on > track. > >> I am mentioning both 4.0.0 and 4.0.0-alpha-1 because eventually I think > we > >> are gonna move everything with target 4.0.0 to 4.0.0-alpha-1. > >> On Wed, Mar 2, 2022 at 9:37 AM Peter Vary <pv...@cloudera.com.invalid> > >> wrote: > >> > >> Hi Team, > >> > >> Could we create tickets for the issues? > >> I think it would be good to collect the issues/potential blockers in the > >> jira instead of having a complicated mail thread. > >> > >> If we set the target version to 4.0.0-alpha-1, then we can easily use > the > >> following filter to see the status of the tasks: > >> > >> > >> > https://issues.apache.org/jira/issues/?jql=project%3D%22HIVE%22%20AND%20%22Target%20Version%2Fs%22%3D%224.0.0-alpha-1%22 > >> < > >> > >> > https://issues.apache.org/jira/issues/?jql=project=%22HIVE%22%20AND%20%22Target%20Version/s%22=%224.0.0-alpha-1%22 > >> > >> > >> > >> @Stamatis: Sadly I have missed your letter/jira and created my own with > >> the fix for building from the src package: > >> https://issues.apache.org/jira/browse/HIVE-25997 < > >> https://issues.apache.org/jira/browse/HIVE-25997> > >> If you have time, I would like to ask you to review. > >> > >> If anyone knows of any blocker I would like to ask them to create a jira > >> for that too. > >> > >> Thanks, > >> Peter > >> > >> > >> On 2022. Mar 2., at 7:04, Sungwoo Park <c...@pl.postech.ac.kr> wrote: > >> > >> Hello Alessandro, > >> > >> For the latest commit, loading ORC tables fails (with the log message > >> > >> shown below). Let me try to find a commit that introduces this bug and > >> create a JIRA ticket. > >> > >> > >> --- Sungwoo > >> > >> 2022-03-02 05:41:56,578 ERROR [Thread-73] exec.StatsTask: Failed to run > >> > >> stats task > >> > >> java.io.IOException: org.apache.hadoop.mapred.InvalidInputException: > >> > >> Input path does not exist: > >> > >> > hdfs://blue0:8020/tmp/hive/gitlab-runner/a236e1b4-b354-4343-b900-3d71b1bc7504/hive_2022-03-02_05-40-50_966_446574755576325031-1/-mr-10000/.hive-staging_hive_2022-03-02_05-40-50_966_446574755576325031-1/-ext-10001 > >> > >> at > >> > >> > >> > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:622) > >> > >> at > >> > >> > >> > org.apache.hadoop.hive.ql.stats.ColStatsProcessor.constructColumnStatsFromPackedRows(ColStatsProcessor.java:105) > >> > >> at > >> > >> > >> > org.apache.hadoop.hive.ql.stats.ColStatsProcessor.persistColumnStats(ColStatsProcessor.java:200) > >> > >> at > >> > >> > >> > org.apache.hadoop.hive.ql.stats.ColStatsProcessor.process(ColStatsProcessor.java:93) > >> > >> at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:107) > >> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) > >> at > >> > >> > >> > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > >> > >> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:83) > >> Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path > >> > >> does not exist: > >> > >> > hdfs://blue0:8020/tmp/hive/gitlab-runner/a236e1b4-b354-4343-b900-3d71b1bc7504/hive_2022-03-02_05-40-50_966_446574755576325031-1/-mr-10000/.hive-staging_hive_2022-03-02_05-40-50_966_446574755576325031-1/-ext-10001 > >> > >> at > >> > >> > >> > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:294) > >> > >> at > >> > >> > >> > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:236) > >> > >> at > >> > >> > >> > org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45) > >> > >> at > >> > >> > >> > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:322) > >> > >> at > >> > >> > >> > org.apache.hadoop.hive.ql.exec.FetchOperator.generateWrappedSplits(FetchOperator.java:435) > >> > >> at > >> > >> > >> > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:402) > >> > >> at > >> > >> > >> > org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:306) > >> > >> at > >> > >> > >> > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:560) > >> > >> ... 7 more > >> > >> On Tue, 1 Mar 2022, Alessandro Solimando wrote: > >> > >> Hi Sungwoo, > >> last time I tried to run TPCDS-based benchmark I stumbled upon a similar > >> situation, finally I found that statistics were not computed, so CBO was > >> not kicking in, and the automatic retry goes with CBO off which was > >> > >> failing > >> > >> for something like 10 queries (subqueries cannot be decorrelated, but > >> > >> also > >> > >> some runtime errors). > >> > >> Making sure that (column) statistics were correctly computed fixed the > >> problem. > >> > >> Can you check if this is the case for you? > >> > >> HTH, > >> Alessandro > >> > >> On Tue, 1 Mar 2022 at 15:28, POSTECH CT <c...@pl.postech.ac.kr> wrote: > >> > >> Hello Hive team, > >> > >> I wonder if anyone in the Hive team has tried the TPC-DS benchmark on > >> the master branch recently. We occasionally run TPC-DS system tests > >> using the master branch, and the tests don't succeed completely. Here > >> is how our TPC-DS tests proceed. > >> > >> 1. Compile and run Hive on Tez (not Hive-LLAP) > >> 2. Load ORC tables from 1TB TPC-DS raw text data, and compute > >> > >> statistics > >> > >> 3. Run 99 TPC-DS queries which were slightly modified to return > >> varying number of rows (rather than 100 rows) > >> 4. Compare the results against the previous results > >> > >> The previous results were obtained and cross-checked by running Hive > >> 3.1.2 and SparkSQL 2.3/3.2, so we are faily confident about their > >> correctness. > >> > >> For the latest commit in the master branch, step 2 fails. For earlier > >> commits (for example, commits in February 2021), step 3 fails where > >> several queries either fail or return wrong results. > >> > >> We can compile and report the test results in this mailing list, but > >> would like to know if similar results have been reproduced by the Hive > >> team, in order to make sure that we did not make errors in our tests. > >> > >> If it is okay to open a JIRA ticket that only reports failures in the > >> TPC-DS test, we could also perform git bi-sect to locate the commit > >> that begin to generate wrong results. > >> > >> --- Sungwoo Park > >> > >> On Tue, 1 Mar 2022, Zoltan Haindrich wrote: > >> > >> Hey, > >> > >> Great to hear that we are on the same side regarding these things :) > >> > >> For around a week now - we have nightly builds for the master branch: > >> http://ci.hive.apache.org/job/hive-nightly/12/ > >> > >> I think we have 1 blocker issue: > >> https://issues.apache.org/jira/browse/HIVE-25665 > >> > >> I know about one more thing I would rather get fixed before we release > >> > >> it: > >> > >> https://issues.apache.org/jira/browse/HIVE-25994 > >> The best would be to introduce smoke tests (HIVE-22302) to ensure that > >> something like this will not happen in the future - but we should > >> > >> probably > >> > >> start moving forward. > >> > >> I think we could call the first iteration of this as "4.0.0-alpha-1" > >> > >> :) > >> > >> > >> I've added 4.0.0-alpha-1 as a version - and added the above two ticket > >> > >> to it. > >> > >> > >> > >> > >> > https://issues.apache.org/jira/issues/?jql=project%20%3D%20HIVE%20AND%20fixVersion%20%3D%204.0.0-alpha-1 > >> > >> > >> Are there any more things you guys know which would be needed? > >> > >> cheers, > >> Zoltan > >> > >> > >> On 2/22/22 12:18 PM, Peter Vary wrote: > >> > >> I would vote for 4.0.0-alpha-1 or similar for all of the components. > >> > >> When we have more stable releases I would keep the 4.x.x schema, > >> > >> since > >> > >> everyone is familiar with it, and I do not see a really good reason > >> > >> to > >> > >> change it. > >> > >> Thanks, > >> Peter > >> > >> > >> On 2022. Feb 10., at 3:34, Szehon Ho <szehon.apa...@gmail.com> > >> > >> wrote: > >> > >> > >> +1 that would be awesome to see Hive master released after so long. > >> > >> Either 4.0 or 4.0.0-alpha-1 makes sense to me, not sure how we would > >> > >> pick > >> > >> any 3.x or calendar date (which could tend to slip and be more > >> confusing?). > >> > >> Thanks in any case to get the ball rolling. > >> Szehon > >> > >> On Wed, Feb 9, 2022 at 4:55 AM Zoltan Haindrich <k...@rxd.hu> > >> > >> wrote: > >> > >> > >> Hey, > >> > >> Thank you guys for chiming in; versioning is for sure something we > >> > >> should > >> > >> get to some common ground. > >> Its a triple problem right now; I think we have the following > >> > >> things: > >> > >> * storage-api > >> ** we have "2.7.3-SNAPSHOT" in the repo > >> *** > >> > >> > >> > >> > https://github.com/apache/hive/blob/0d1cffffc7c5005fe47759298fb35a1c67edc93f/storage-api/pom.xml#L27 > >> > >> ** meanwhile we already have 2.8.1 released to maven central > >> *** > >> > >> https://mvnrepository.com/artifact/org.apache.hive/hive-storage-api > >> > >> * standalone-metastore > >> ** 4.0.0-SNAPSHOT in the repo > >> ** last release is 3.1.2 > >> * hive > >> ** 4.0.0-SNAPSHOT in the repo > >> ** last release is 3.1.2 > >> > >> Regarding the actual version number I'm not entirely sure where we > >> > >> should > >> > >> start the numbering - that's why I was referring to it as Hive-X > >> > >> in my > >> > >> first letter. > >> > >> I think the key point here would be to start shipping releases > >> > >> regularily > >> > >> and not the actual version number we will use - I'll kinda open to > >> > >> any > >> > >> versioning scheme which > >> reflects that this is a newer release than 3.1.2. > >> > >> I could imagine the following ones: > >> (A) start with something less expected; but keep 3 in the prefix to > >> reflect that this is not yet 4.0 > >> I can imagine the following numbers: > >> 3.900.0, 3.901.0, ... > >> 3.9.0, 3.9.1, ... > >> (B) start 4.0.0 > >> 4.0.0, 4.1.0, ... > >> (C) jump to some calendar based version number like 2022.2.9 > >> trunk based development has pros and cons...making a move like > >> > >> this > >> > >> irreversibly pledges trunk based development; and makes release > >> > >> branches > >> > >> hard to introduce > >> (X) somewhat orthogonal is to (also) use some suffixes > >> 4.0.0-alpha1, 4.0.0-alpha2, 4.0.0-beta1 > >> this is probably the most tempting to use - but this versioning > >> schema with a non-changing MINOR and PATCH number will > >> also suggest that the actual software is fully compatible - and > >> > >> only > >> > >> bugs are being fixed - which will not be true... > >> > >> I really like the idea to suffix these releases with alpha or beta > >> > >> - > >> > >> which > >> will communicate our level commitment that these are not 100% > >> > >> production > >> > >> ready artifacts. > >> > >> I think we could fix HIVE-25665; and probably experiment with > >> 4.0.0-alpha1 > >> for start... > >> > >> This also means there should *not* be a branch-4 after releasing > >> > >> Hive > >> > >> 4.0 > >> > >> and let that diverge (and becomes the next, super-ignored > >> > >> branch-3), > >> > >> correct; no need to keep a branch we don't maintain...but in any > >> > >> case > >> > >> I > >> > >> think we can postpone this decision until there will be something > >> > >> to > >> > >> release... :) > >> > >> cheers, > >> Zoltan > >> > >> > >> > >> On 2/9/22 10:23 AM, L?szl? Bodor wrote: > >> > >> Hi All! > >> > >> A purely technical question: what will the SNAPSHOT version become > >> > >> after > >> > >> releasing Hive 4.0.0? I think this is important, as it defines and > >> > >> reflects > >> > >> the future release plans. > >> > >> Currently, it's 4.0.0-SNAPSHOT, I guess it's since Hive 3.0 + > >> > >> branch-3. > >> > >> Hive is an evolving and super-active project: if we want to make > >> > >> regular > >> > >> releases, we should simply release Hive 4.0 and bump pom to > >> > >> 4.1.0-SNAPSHOT, > >> > >> which clearly says that we can release Hive 4.1 anytime we want, > >> > >> without > >> > >> being frustrated about "whether we included enough cool stuff to > >> > >> release > >> > >> 5.0". > >> > >> This also means there should *not* be a branch-4 after releasing > >> > >> Hive > >> > >> 4.0 > >> and let that diverge (and becomes the next, super-ignored > >> > >> branch-3), > >> > >> only > >> when we end up bringing a minor backward-incompatible thing that > >> > >> needs a > >> > >> 4.0.x, and when it happens, we'll create *branch-4.0 *on demand. > >> > >> For > >> > >> me, > >> > >> a > >> > >> branch called *branch-4.0* doesn't imply either I can expect cool > >> > >> releases > >> > >> in the future from there or the branch is maintained and tries to > >> > >> be > >> > >> in > >> > >> sync with the *master*. > >> > >> Regards, > >> Laszlo Bodor > >> > >> Alessandro Solimando <alessandro.solima...@gmail.com> ezt ?rta > >> > >> (id?pont: > >> > >> 2022. febr. 8., K, 16:42): > >> > >> Hello everyone, > >> thank you for starting this discussion. > >> > >> I agree that releasing the master branch regularly and > >> > >> sufficiently > >> > >> often > >> > >> is welcome and vital for the health of the community. > >> > >> It would be great to hear from others too, especially PMC members > >> > >> and > >> > >> committers, but even simple contributors/followers as myself. > >> > >> Best regards, > >> Alessandro > >> > >> On Wed, 2 Feb 2022 at 12:22, Stamatis Zampetakis < > >> > >> zabe...@gmail.com > >> > >> > >> wrote: > >> > >> Hello, > >> > >> Thanks for starting the discussion Zoltan. > >> > >> I strongly believe that it is important to have regular and > >> > >> often > >> > >> releases > >> > >> otherwise people will create and maintain separate Hive forks. > >> The latter is not good for the project and the community may > >> > >> lose > >> > >> valuable > >> > >> members because of it. > >> > >> Going forward I fully agree that there is no point bringing up > >> > >> strong > >> > >> blockers for the next release. For sure there are many backward > >> incompatible changes and possibly unstable features but unless > >> > >> we > >> > >> get > >> > >> a > >> release out it will be difficult to determine what is broken and > >> > >> what > >> > >> needs > >> > >> to be fixed. > >> > >> Due to the big number of changes that are going to appear in the > >> > >> next > >> > >> version I would suggest using the terms Hive X-alpha, Hive > >> > >> X-beta > >> > >> for > >> > >> the > >> > >> first few releases. This will make it clear to the end users > >> > >> that > >> > >> they > >> > >> need > >> > >> to be careful when upgrading from an older version and it will > >> > >> give us > >> > >> a > >> > >> bit more time and freedom to treat issues that the users will > >> > >> likely > >> > >> discover. > >> > >> The only real blocker that we may want to treat is HIVE-25665 > >> > >> [1] > >> > >> but > >> > >> we > >> > >> can continue the discussion under that ticket and re-evaluate if > >> > >> necessary, > >> > >> > >> Best, > >> Stamatis > >> > >> [1] https://issues.apache.org/jira/browse/HIVE-25665 > >> > >> > >> On Tue, Feb 1, 2022 at 5:03 PM Zoltan Haindrich <k...@rxd.hu> > >> > >> wrote: > >> > >> > >> Hey All, > >> > >> We didn't made a release for a long time now; (3.1.2 was > >> > >> released > >> > >> on > >> > >> 26 > >> > >> August 2019) - and I think because we didn't made that many > >> > >> branch-3 > >> > >> releases; not too many fixes > >> were ported there - which made that release branch kinda erode > >> > >> away. > >> > >> > >> We have a lot of new features/changes in the current master. > >> I think instead of aiming for big feature-packed releases we > >> > >> should > >> > >> aim > >> > >> for making a regular release every few months - we should make > >> regular > >> releases which people could > >> install and use. > >> After all releasing Hive after more than 2 years would be big > >> > >> step > >> > >> forward > >> > >> in itself alone - we have so many improvements that I can't > >> > >> even > >> > >> count... > >> > >> > >> But I may know not every aspects of the project / states of > >> > >> some > >> > >> internal > >> > >> features - so I would like to ask you: > >> What would be the bare minimum requirements before we could > >> > >> release > >> > >> the > >> > >> current master as Hive X? > >> > >> There are many nice-to-have-s like: > >> * hadoop upgrade > >> * jdk11 > >> * remove HoS or MR > >> * ? > >> but I don't think these are blockers...we can make any of these > >> > >> in > >> > >> the > >> next release if we start making them... > >> > >> cheers, > >> Zoltan > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >