Hi Team, If everyone agrees, tomorrow I would like to start the release process for 4.0.0-alpha-1.
Is there any outstanding blocker jira that you know of? Thanks, Peter > On 2022. Mar 9., at 17:01, Stamatis Zampetakis <zabe...@gmail.com> wrote: > > I just logged HIVE-26022 [1] which seems to be another potential blocker > for 4.0.0-alpha-1. > > Best, > Stamatis > > [1] https://issues.apache.org/jira/browse/HIVE-26022 > > On Thu, Mar 3, 2022 at 3:54 PM Peter Vary <pv...@cloudera.com> wrote: > >> Hi Team, >> >> Here is our status: >> We collected the blocker tickets and marked them with fixVersion >> 4.0.0-alpha-1: >> >> https://issues.apache.org/jira/issues/?filter=-1&jql=project%20%3D%20HIVE%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%204.0.0-alpha-1 >> <https://issues.apache.org/jira/issues/?filter=-1&jql=project%20=%20HIVE%20AND%20resolution%20=%20Unresolved%20AND%20fixVersion%20=%204.0.0-alpha-1> >> >> - HIVE-26002 - Create db scripts for 4.0.0-alpha-1 >> - HIVE-25994 - Analyze table runs into ClassNotFoundException-s in >> case binary distribution is used >> - HIVE-25935 - Cleanup IMetaStoreClient#getPartitionsByNames APIs >> >> Please create a jira and mark it with fixVersion 4.0.0-alpha-1, if you >> happen to know of any other blockers. >> >> We plan to fix these jiras, and then release the following artifacts >> together: >> >> - Storage API - 4.0.0-alpha-1 >> - Standalone Metastore - 4.0.0-alpha-1 >> - Hive - 4.0.0-alpha-1 >> >> >> Thanks, >> Peter >> >> >> On 2022. Mar 2., at 11:50, Peter Vary <pv...@cloudera.com> wrote: >> >> Will continue this discussion on the #hive ASF slack. If you are >> interested, please join. >> We will do updates here time-to-time, so the ones who are not using slack >> can participate that way. >> >> On 2022. Mar 2., at 11:11, Peter Vary <pv...@cloudera.com> wrote: >> >> Good idea Zoltan, joined the channel. >> I would like to scope reasonably small, so I agree with focusing on >> 4.0.0-alpha-1 >> >> On 2022. Mar 2., at 11:01, Zoltan Haindrich <k...@rxd.hu> wrote: >> >> Hey, >> >> regarding 4.0.0 / 4.0.0-alpha-1 target/fix versions in the jira: >> * I think we should change all already resolved tickets with fix version >> 4.0.0 to have fix version 4.0.0-alpha-1 >> ** this could be postponed until we are actually releasing the thing as I >> think everyone committing to the master is entering 4.0.0 as fix version >> without much aftertought...this could probably change after we get the >> first release out. >> * regarding the the existing tickets with fix version/target version 4.0.0 >> - I think that would be a bit too much (>200 tickets) >> ** some numbers: >> *** 239 tickets open now >> *** 224 was not updated in the last 90 days >> *** 216 was not changed in the last 180 days >> *** 178 was not updated in the last 360 days >> ** as a matter of fact I think many of these tickets shouldn't even have a >> target or fix version - and most of them should be unassigned...I don't >> want to get lost in this right now...I think for now we should keep the >> scope small and only care with 4.0.0-alpha-1 tickets >> >> https://issues.apache.org/jira/issues/? >> jql=project%20%3D%20hive%20and%20resolutiondate%20%20is%20empty%20and%20(fixVersion%20%20in%20(%274.0.0%27)%20or%20cf%5B12310320%5D%20%20in%20(%274.0.0%27)) >> >> I think for faster communication regarding these things we could also >> utilize the #hive channel on the ASF slack - what do you guys think? >> >> cheers, >> Zoltan >> >> On 3/2/22 9:51 AM, Stamatis Zampetakis wrote: >> >> Agree with Peter, creating JIRAs is the way to go. >> Putting the appropriate priority (e.g., BLOCKER) and version (4.0.0 or >> 4.0.0-alpha-1) when creating the JIRA should be enough to keep us on track. >> I am mentioning both 4.0.0 and 4.0.0-alpha-1 because eventually I think we >> are gonna move everything with target 4.0.0 to 4.0.0-alpha-1. >> On Wed, Mar 2, 2022 at 9:37 AM Peter Vary <pv...@cloudera.com.invalid> >> wrote: >> >> Hi Team, >> >> Could we create tickets for the issues? >> I think it would be good to collect the issues/potential blockers in the >> jira instead of having a complicated mail thread. >> >> If we set the target version to 4.0.0-alpha-1, then we can easily use the >> following filter to see the status of the tasks: >> >> >> https://issues.apache.org/jira/issues/?jql=project%3D%22HIVE%22%20AND%20%22Target%20Version%2Fs%22%3D%224.0.0-alpha-1%22 >> < >> >> https://issues.apache.org/jira/issues/?jql=project=%22HIVE%22%20AND%20%22Target%20Version/s%22=%224.0.0-alpha-1%22 >> >> >> >> @Stamatis: Sadly I have missed your letter/jira and created my own with >> the fix for building from the src package: >> https://issues.apache.org/jira/browse/HIVE-25997 < >> https://issues.apache.org/jira/browse/HIVE-25997> >> If you have time, I would like to ask you to review. >> >> If anyone knows of any blocker I would like to ask them to create a jira >> for that too. >> >> Thanks, >> Peter >> >> >> On 2022. Mar 2., at 7:04, Sungwoo Park <c...@pl.postech.ac.kr> wrote: >> >> Hello Alessandro, >> >> For the latest commit, loading ORC tables fails (with the log message >> >> shown below). Let me try to find a commit that introduces this bug and >> create a JIRA ticket. >> >> >> --- Sungwoo >> >> 2022-03-02 05:41:56,578 ERROR [Thread-73] exec.StatsTask: Failed to run >> >> stats task >> >> java.io.IOException: org.apache.hadoop.mapred.InvalidInputException: >> >> Input path does not exist: >> >> hdfs://blue0:8020/tmp/hive/gitlab-runner/a236e1b4-b354-4343-b900-3d71b1bc7504/hive_2022-03-02_05-40-50_966_446574755576325031-1/-mr-10000/.hive-staging_hive_2022-03-02_05-40-50_966_446574755576325031-1/-ext-10001 >> >> at >> >> >> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:622) >> >> at >> >> >> org.apache.hadoop.hive.ql.stats.ColStatsProcessor.constructColumnStatsFromPackedRows(ColStatsProcessor.java:105) >> >> at >> >> >> org.apache.hadoop.hive.ql.stats.ColStatsProcessor.persistColumnStats(ColStatsProcessor.java:200) >> >> at >> >> >> org.apache.hadoop.hive.ql.stats.ColStatsProcessor.process(ColStatsProcessor.java:93) >> >> at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:107) >> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) >> at >> >> >> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) >> >> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:83) >> Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path >> >> does not exist: >> >> hdfs://blue0:8020/tmp/hive/gitlab-runner/a236e1b4-b354-4343-b900-3d71b1bc7504/hive_2022-03-02_05-40-50_966_446574755576325031-1/-mr-10000/.hive-staging_hive_2022-03-02_05-40-50_966_446574755576325031-1/-ext-10001 >> >> at >> >> >> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:294) >> >> at >> >> >> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:236) >> >> at >> >> >> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45) >> >> at >> >> >> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:322) >> >> at >> >> >> org.apache.hadoop.hive.ql.exec.FetchOperator.generateWrappedSplits(FetchOperator.java:435) >> >> at >> >> >> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:402) >> >> at >> >> >> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:306) >> >> at >> >> >> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:560) >> >> ... 7 more >> >> On Tue, 1 Mar 2022, Alessandro Solimando wrote: >> >> Hi Sungwoo, >> last time I tried to run TPCDS-based benchmark I stumbled upon a similar >> situation, finally I found that statistics were not computed, so CBO was >> not kicking in, and the automatic retry goes with CBO off which was >> >> failing >> >> for something like 10 queries (subqueries cannot be decorrelated, but >> >> also >> >> some runtime errors). >> >> Making sure that (column) statistics were correctly computed fixed the >> problem. >> >> Can you check if this is the case for you? >> >> HTH, >> Alessandro >> >> On Tue, 1 Mar 2022 at 15:28, POSTECH CT <c...@pl.postech.ac.kr> wrote: >> >> Hello Hive team, >> >> I wonder if anyone in the Hive team has tried the TPC-DS benchmark on >> the master branch recently. We occasionally run TPC-DS system tests >> using the master branch, and the tests don't succeed completely. Here >> is how our TPC-DS tests proceed. >> >> 1. Compile and run Hive on Tez (not Hive-LLAP) >> 2. Load ORC tables from 1TB TPC-DS raw text data, and compute >> >> statistics >> >> 3. Run 99 TPC-DS queries which were slightly modified to return >> varying number of rows (rather than 100 rows) >> 4. Compare the results against the previous results >> >> The previous results were obtained and cross-checked by running Hive >> 3.1.2 and SparkSQL 2.3/3.2, so we are faily confident about their >> correctness. >> >> For the latest commit in the master branch, step 2 fails. For earlier >> commits (for example, commits in February 2021), step 3 fails where >> several queries either fail or return wrong results. >> >> We can compile and report the test results in this mailing list, but >> would like to know if similar results have been reproduced by the Hive >> team, in order to make sure that we did not make errors in our tests. >> >> If it is okay to open a JIRA ticket that only reports failures in the >> TPC-DS test, we could also perform git bi-sect to locate the commit >> that begin to generate wrong results. >> >> --- Sungwoo Park >> >> On Tue, 1 Mar 2022, Zoltan Haindrich wrote: >> >> Hey, >> >> Great to hear that we are on the same side regarding these things :) >> >> For around a week now - we have nightly builds for the master branch: >> http://ci.hive.apache.org/job/hive-nightly/12/ >> >> I think we have 1 blocker issue: >> https://issues.apache.org/jira/browse/HIVE-25665 >> >> I know about one more thing I would rather get fixed before we release >> >> it: >> >> https://issues.apache.org/jira/browse/HIVE-25994 >> The best would be to introduce smoke tests (HIVE-22302) to ensure that >> something like this will not happen in the future - but we should >> >> probably >> >> start moving forward. >> >> I think we could call the first iteration of this as "4.0.0-alpha-1" >> >> :) >> >> >> I've added 4.0.0-alpha-1 as a version - and added the above two ticket >> >> to it. >> >> >> >> >> https://issues.apache.org/jira/issues/?jql=project%20%3D%20HIVE%20AND%20fixVersion%20%3D%204.0.0-alpha-1 >> >> >> Are there any more things you guys know which would be needed? >> >> cheers, >> Zoltan >> >> >> On 2/22/22 12:18 PM, Peter Vary wrote: >> >> I would vote for 4.0.0-alpha-1 or similar for all of the components. >> >> When we have more stable releases I would keep the 4.x.x schema, >> >> since >> >> everyone is familiar with it, and I do not see a really good reason >> >> to >> >> change it. >> >> Thanks, >> Peter >> >> >> On 2022. Feb 10., at 3:34, Szehon Ho <szehon.apa...@gmail.com> >> >> wrote: >> >> >> +1 that would be awesome to see Hive master released after so long. >> >> Either 4.0 or 4.0.0-alpha-1 makes sense to me, not sure how we would >> >> pick >> >> any 3.x or calendar date (which could tend to slip and be more >> confusing?). >> >> Thanks in any case to get the ball rolling. >> Szehon >> >> On Wed, Feb 9, 2022 at 4:55 AM Zoltan Haindrich <k...@rxd.hu> >> >> wrote: >> >> >> Hey, >> >> Thank you guys for chiming in; versioning is for sure something we >> >> should >> >> get to some common ground. >> Its a triple problem right now; I think we have the following >> >> things: >> >> * storage-api >> ** we have "2.7.3-SNAPSHOT" in the repo >> *** >> >> >> >> https://github.com/apache/hive/blob/0d1cffffc7c5005fe47759298fb35a1c67edc93f/storage-api/pom.xml#L27 >> >> ** meanwhile we already have 2.8.1 released to maven central >> *** >> >> https://mvnrepository.com/artifact/org.apache.hive/hive-storage-api >> >> * standalone-metastore >> ** 4.0.0-SNAPSHOT in the repo >> ** last release is 3.1.2 >> * hive >> ** 4.0.0-SNAPSHOT in the repo >> ** last release is 3.1.2 >> >> Regarding the actual version number I'm not entirely sure where we >> >> should >> >> start the numbering - that's why I was referring to it as Hive-X >> >> in my >> >> first letter. >> >> I think the key point here would be to start shipping releases >> >> regularily >> >> and not the actual version number we will use - I'll kinda open to >> >> any >> >> versioning scheme which >> reflects that this is a newer release than 3.1.2. >> >> I could imagine the following ones: >> (A) start with something less expected; but keep 3 in the prefix to >> reflect that this is not yet 4.0 >> I can imagine the following numbers: >> 3.900.0, 3.901.0, ... >> 3.9.0, 3.9.1, ... >> (B) start 4.0.0 >> 4.0.0, 4.1.0, ... >> (C) jump to some calendar based version number like 2022.2.9 >> trunk based development has pros and cons...making a move like >> >> this >> >> irreversibly pledges trunk based development; and makes release >> >> branches >> >> hard to introduce >> (X) somewhat orthogonal is to (also) use some suffixes >> 4.0.0-alpha1, 4.0.0-alpha2, 4.0.0-beta1 >> this is probably the most tempting to use - but this versioning >> schema with a non-changing MINOR and PATCH number will >> also suggest that the actual software is fully compatible - and >> >> only >> >> bugs are being fixed - which will not be true... >> >> I really like the idea to suffix these releases with alpha or beta >> >> - >> >> which >> will communicate our level commitment that these are not 100% >> >> production >> >> ready artifacts. >> >> I think we could fix HIVE-25665; and probably experiment with >> 4.0.0-alpha1 >> for start... >> >> This also means there should *not* be a branch-4 after releasing >> >> Hive >> >> 4.0 >> >> and let that diverge (and becomes the next, super-ignored >> >> branch-3), >> >> correct; no need to keep a branch we don't maintain...but in any >> >> case >> >> I >> >> think we can postpone this decision until there will be something >> >> to >> >> release... :) >> >> cheers, >> Zoltan >> >> >> >> On 2/9/22 10:23 AM, L?szl? Bodor wrote: >> >> Hi All! >> >> A purely technical question: what will the SNAPSHOT version become >> >> after >> >> releasing Hive 4.0.0? I think this is important, as it defines and >> >> reflects >> >> the future release plans. >> >> Currently, it's 4.0.0-SNAPSHOT, I guess it's since Hive 3.0 + >> >> branch-3. >> >> Hive is an evolving and super-active project: if we want to make >> >> regular >> >> releases, we should simply release Hive 4.0 and bump pom to >> >> 4.1.0-SNAPSHOT, >> >> which clearly says that we can release Hive 4.1 anytime we want, >> >> without >> >> being frustrated about "whether we included enough cool stuff to >> >> release >> >> 5.0". >> >> This also means there should *not* be a branch-4 after releasing >> >> Hive >> >> 4.0 >> and let that diverge (and becomes the next, super-ignored >> >> branch-3), >> >> only >> when we end up bringing a minor backward-incompatible thing that >> >> needs a >> >> 4.0.x, and when it happens, we'll create *branch-4.0 *on demand. >> >> For >> >> me, >> >> a >> >> branch called *branch-4.0* doesn't imply either I can expect cool >> >> releases >> >> in the future from there or the branch is maintained and tries to >> >> be >> >> in >> >> sync with the *master*. >> >> Regards, >> Laszlo Bodor >> >> Alessandro Solimando <alessandro.solima...@gmail.com> ezt ?rta >> >> (id?pont: >> >> 2022. febr. 8., K, 16:42): >> >> Hello everyone, >> thank you for starting this discussion. >> >> I agree that releasing the master branch regularly and >> >> sufficiently >> >> often >> >> is welcome and vital for the health of the community. >> >> It would be great to hear from others too, especially PMC members >> >> and >> >> committers, but even simple contributors/followers as myself. >> >> Best regards, >> Alessandro >> >> On Wed, 2 Feb 2022 at 12:22, Stamatis Zampetakis < >> >> zabe...@gmail.com >> >> >> wrote: >> >> Hello, >> >> Thanks for starting the discussion Zoltan. >> >> I strongly believe that it is important to have regular and >> >> often >> >> releases >> >> otherwise people will create and maintain separate Hive forks. >> The latter is not good for the project and the community may >> >> lose >> >> valuable >> >> members because of it. >> >> Going forward I fully agree that there is no point bringing up >> >> strong >> >> blockers for the next release. For sure there are many backward >> incompatible changes and possibly unstable features but unless >> >> we >> >> get >> >> a >> release out it will be difficult to determine what is broken and >> >> what >> >> needs >> >> to be fixed. >> >> Due to the big number of changes that are going to appear in the >> >> next >> >> version I would suggest using the terms Hive X-alpha, Hive >> >> X-beta >> >> for >> >> the >> >> first few releases. This will make it clear to the end users >> >> that >> >> they >> >> need >> >> to be careful when upgrading from an older version and it will >> >> give us >> >> a >> >> bit more time and freedom to treat issues that the users will >> >> likely >> >> discover. >> >> The only real blocker that we may want to treat is HIVE-25665 >> >> [1] >> >> but >> >> we >> >> can continue the discussion under that ticket and re-evaluate if >> >> necessary, >> >> >> Best, >> Stamatis >> >> [1] https://issues.apache.org/jira/browse/HIVE-25665 >> >> >> On Tue, Feb 1, 2022 at 5:03 PM Zoltan Haindrich <k...@rxd.hu> >> >> wrote: >> >> >> Hey All, >> >> We didn't made a release for a long time now; (3.1.2 was >> >> released >> >> on >> >> 26 >> >> August 2019) - and I think because we didn't made that many >> >> branch-3 >> >> releases; not too many fixes >> were ported there - which made that release branch kinda erode >> >> away. >> >> >> We have a lot of new features/changes in the current master. >> I think instead of aiming for big feature-packed releases we >> >> should >> >> aim >> >> for making a regular release every few months - we should make >> regular >> releases which people could >> install and use. >> After all releasing Hive after more than 2 years would be big >> >> step >> >> forward >> >> in itself alone - we have so many improvements that I can't >> >> even >> >> count... >> >> >> But I may know not every aspects of the project / states of >> >> some >> >> internal >> >> features - so I would like to ask you: >> What would be the bare minimum requirements before we could >> >> release >> >> the >> >> current master as Hive X? >> >> There are many nice-to-have-s like: >> * hadoop upgrade >> * jdk11 >> * remove HoS or MR >> * ? >> but I don't think these are blockers...we can make any of these >> >> in >> >> the >> next release if we start making them... >> >> cheers, >> Zoltan >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>