I just logged HIVE-26022 [1] which seems to be another potential blocker for 4.0.0-alpha-1.
Best, Stamatis [1] https://issues.apache.org/jira/browse/HIVE-26022 On Thu, Mar 3, 2022 at 3:54 PM Peter Vary <pv...@cloudera.com> wrote: > Hi Team, > > Here is our status: > We collected the blocker tickets and marked them with fixVersion > 4.0.0-alpha-1: > > https://issues.apache.org/jira/issues/?filter=-1&jql=project%20%3D%20HIVE%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%204.0.0-alpha-1 > <https://issues.apache.org/jira/issues/?filter=-1&jql=project%20=%20HIVE%20AND%20resolution%20=%20Unresolved%20AND%20fixVersion%20=%204.0.0-alpha-1> > > - HIVE-26002 - Create db scripts for 4.0.0-alpha-1 > - HIVE-25994 - Analyze table runs into ClassNotFoundException-s in > case binary distribution is used > - HIVE-25935 - Cleanup IMetaStoreClient#getPartitionsByNames APIs > > Please create a jira and mark it with fixVersion 4.0.0-alpha-1, if you > happen to know of any other blockers. > > We plan to fix these jiras, and then release the following artifacts > together: > > - Storage API - 4.0.0-alpha-1 > - Standalone Metastore - 4.0.0-alpha-1 > - Hive - 4.0.0-alpha-1 > > > Thanks, > Peter > > > On 2022. Mar 2., at 11:50, Peter Vary <pv...@cloudera.com> wrote: > > Will continue this discussion on the #hive ASF slack. If you are > interested, please join. > We will do updates here time-to-time, so the ones who are not using slack > can participate that way. > > On 2022. Mar 2., at 11:11, Peter Vary <pv...@cloudera.com> wrote: > > Good idea Zoltan, joined the channel. > I would like to scope reasonably small, so I agree with focusing on > 4.0.0-alpha-1 > > On 2022. Mar 2., at 11:01, Zoltan Haindrich <k...@rxd.hu> wrote: > > Hey, > > regarding 4.0.0 / 4.0.0-alpha-1 target/fix versions in the jira: > * I think we should change all already resolved tickets with fix version > 4.0.0 to have fix version 4.0.0-alpha-1 > ** this could be postponed until we are actually releasing the thing as I > think everyone committing to the master is entering 4.0.0 as fix version > without much aftertought...this could probably change after we get the > first release out. > * regarding the the existing tickets with fix version/target version 4.0.0 > - I think that would be a bit too much (>200 tickets) > ** some numbers: > *** 239 tickets open now > *** 224 was not updated in the last 90 days > *** 216 was not changed in the last 180 days > *** 178 was not updated in the last 360 days > ** as a matter of fact I think many of these tickets shouldn't even have a > target or fix version - and most of them should be unassigned...I don't > want to get lost in this right now...I think for now we should keep the > scope small and only care with 4.0.0-alpha-1 tickets > > https://issues.apache.org/jira/issues/? > jql=project%20%3D%20hive%20and%20resolutiondate%20%20is%20empty%20and%20(fixVersion%20%20in%20(%274.0.0%27)%20or%20cf%5B12310320%5D%20%20in%20(%274.0.0%27)) > > I think for faster communication regarding these things we could also > utilize the #hive channel on the ASF slack - what do you guys think? > > cheers, > Zoltan > > On 3/2/22 9:51 AM, Stamatis Zampetakis wrote: > > Agree with Peter, creating JIRAs is the way to go. > Putting the appropriate priority (e.g., BLOCKER) and version (4.0.0 or > 4.0.0-alpha-1) when creating the JIRA should be enough to keep us on track. > I am mentioning both 4.0.0 and 4.0.0-alpha-1 because eventually I think we > are gonna move everything with target 4.0.0 to 4.0.0-alpha-1. > On Wed, Mar 2, 2022 at 9:37 AM Peter Vary <pv...@cloudera.com.invalid> > wrote: > > Hi Team, > > Could we create tickets for the issues? > I think it would be good to collect the issues/potential blockers in the > jira instead of having a complicated mail thread. > > If we set the target version to 4.0.0-alpha-1, then we can easily use the > following filter to see the status of the tasks: > > > https://issues.apache.org/jira/issues/?jql=project%3D%22HIVE%22%20AND%20%22Target%20Version%2Fs%22%3D%224.0.0-alpha-1%22 > < > > https://issues.apache.org/jira/issues/?jql=project=%22HIVE%22%20AND%20%22Target%20Version/s%22=%224.0.0-alpha-1%22 > > > > @Stamatis: Sadly I have missed your letter/jira and created my own with > the fix for building from the src package: > https://issues.apache.org/jira/browse/HIVE-25997 < > https://issues.apache.org/jira/browse/HIVE-25997> > If you have time, I would like to ask you to review. > > If anyone knows of any blocker I would like to ask them to create a jira > for that too. > > Thanks, > Peter > > > On 2022. Mar 2., at 7:04, Sungwoo Park <c...@pl.postech.ac.kr> wrote: > > Hello Alessandro, > > For the latest commit, loading ORC tables fails (with the log message > > shown below). Let me try to find a commit that introduces this bug and > create a JIRA ticket. > > > --- Sungwoo > > 2022-03-02 05:41:56,578 ERROR [Thread-73] exec.StatsTask: Failed to run > > stats task > > java.io.IOException: org.apache.hadoop.mapred.InvalidInputException: > > Input path does not exist: > > hdfs://blue0:8020/tmp/hive/gitlab-runner/a236e1b4-b354-4343-b900-3d71b1bc7504/hive_2022-03-02_05-40-50_966_446574755576325031-1/-mr-10000/.hive-staging_hive_2022-03-02_05-40-50_966_446574755576325031-1/-ext-10001 > > at > > > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:622) > > at > > > org.apache.hadoop.hive.ql.stats.ColStatsProcessor.constructColumnStatsFromPackedRows(ColStatsProcessor.java:105) > > at > > > org.apache.hadoop.hive.ql.stats.ColStatsProcessor.persistColumnStats(ColStatsProcessor.java:200) > > at > > > org.apache.hadoop.hive.ql.stats.ColStatsProcessor.process(ColStatsProcessor.java:93) > > at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:107) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) > at > > > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > > at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:83) > Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path > > does not exist: > > hdfs://blue0:8020/tmp/hive/gitlab-runner/a236e1b4-b354-4343-b900-3d71b1bc7504/hive_2022-03-02_05-40-50_966_446574755576325031-1/-mr-10000/.hive-staging_hive_2022-03-02_05-40-50_966_446574755576325031-1/-ext-10001 > > at > > > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:294) > > at > > > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:236) > > at > > > org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45) > > at > > > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:322) > > at > > > org.apache.hadoop.hive.ql.exec.FetchOperator.generateWrappedSplits(FetchOperator.java:435) > > at > > > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:402) > > at > > > org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:306) > > at > > > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:560) > > ... 7 more > > On Tue, 1 Mar 2022, Alessandro Solimando wrote: > > Hi Sungwoo, > last time I tried to run TPCDS-based benchmark I stumbled upon a similar > situation, finally I found that statistics were not computed, so CBO was > not kicking in, and the automatic retry goes with CBO off which was > > failing > > for something like 10 queries (subqueries cannot be decorrelated, but > > also > > some runtime errors). > > Making sure that (column) statistics were correctly computed fixed the > problem. > > Can you check if this is the case for you? > > HTH, > Alessandro > > On Tue, 1 Mar 2022 at 15:28, POSTECH CT <c...@pl.postech.ac.kr> wrote: > > Hello Hive team, > > I wonder if anyone in the Hive team has tried the TPC-DS benchmark on > the master branch recently. We occasionally run TPC-DS system tests > using the master branch, and the tests don't succeed completely. Here > is how our TPC-DS tests proceed. > > 1. Compile and run Hive on Tez (not Hive-LLAP) > 2. Load ORC tables from 1TB TPC-DS raw text data, and compute > > statistics > > 3. Run 99 TPC-DS queries which were slightly modified to return > varying number of rows (rather than 100 rows) > 4. Compare the results against the previous results > > The previous results were obtained and cross-checked by running Hive > 3.1.2 and SparkSQL 2.3/3.2, so we are faily confident about their > correctness. > > For the latest commit in the master branch, step 2 fails. For earlier > commits (for example, commits in February 2021), step 3 fails where > several queries either fail or return wrong results. > > We can compile and report the test results in this mailing list, but > would like to know if similar results have been reproduced by the Hive > team, in order to make sure that we did not make errors in our tests. > > If it is okay to open a JIRA ticket that only reports failures in the > TPC-DS test, we could also perform git bi-sect to locate the commit > that begin to generate wrong results. > > --- Sungwoo Park > > On Tue, 1 Mar 2022, Zoltan Haindrich wrote: > > Hey, > > Great to hear that we are on the same side regarding these things :) > > For around a week now - we have nightly builds for the master branch: > http://ci.hive.apache.org/job/hive-nightly/12/ > > I think we have 1 blocker issue: > https://issues.apache.org/jira/browse/HIVE-25665 > > I know about one more thing I would rather get fixed before we release > > it: > > https://issues.apache.org/jira/browse/HIVE-25994 > The best would be to introduce smoke tests (HIVE-22302) to ensure that > something like this will not happen in the future - but we should > > probably > > start moving forward. > > I think we could call the first iteration of this as "4.0.0-alpha-1" > > :) > > > I've added 4.0.0-alpha-1 as a version - and added the above two ticket > > to it. > > > > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20HIVE%20AND%20fixVersion%20%3D%204.0.0-alpha-1 > > > Are there any more things you guys know which would be needed? > > cheers, > Zoltan > > > On 2/22/22 12:18 PM, Peter Vary wrote: > > I would vote for 4.0.0-alpha-1 or similar for all of the components. > > When we have more stable releases I would keep the 4.x.x schema, > > since > > everyone is familiar with it, and I do not see a really good reason > > to > > change it. > > Thanks, > Peter > > > On 2022. Feb 10., at 3:34, Szehon Ho <szehon.apa...@gmail.com> > > wrote: > > > +1 that would be awesome to see Hive master released after so long. > > Either 4.0 or 4.0.0-alpha-1 makes sense to me, not sure how we would > > pick > > any 3.x or calendar date (which could tend to slip and be more > confusing?). > > Thanks in any case to get the ball rolling. > Szehon > > On Wed, Feb 9, 2022 at 4:55 AM Zoltan Haindrich <k...@rxd.hu> > > wrote: > > > Hey, > > Thank you guys for chiming in; versioning is for sure something we > > should > > get to some common ground. > Its a triple problem right now; I think we have the following > > things: > > * storage-api > ** we have "2.7.3-SNAPSHOT" in the repo > *** > > > > https://github.com/apache/hive/blob/0d1cffffc7c5005fe47759298fb35a1c67edc93f/storage-api/pom.xml#L27 > > ** meanwhile we already have 2.8.1 released to maven central > *** > > https://mvnrepository.com/artifact/org.apache.hive/hive-storage-api > > * standalone-metastore > ** 4.0.0-SNAPSHOT in the repo > ** last release is 3.1.2 > * hive > ** 4.0.0-SNAPSHOT in the repo > ** last release is 3.1.2 > > Regarding the actual version number I'm not entirely sure where we > > should > > start the numbering - that's why I was referring to it as Hive-X > > in my > > first letter. > > I think the key point here would be to start shipping releases > > regularily > > and not the actual version number we will use - I'll kinda open to > > any > > versioning scheme which > reflects that this is a newer release than 3.1.2. > > I could imagine the following ones: > (A) start with something less expected; but keep 3 in the prefix to > reflect that this is not yet 4.0 > I can imagine the following numbers: > 3.900.0, 3.901.0, ... > 3.9.0, 3.9.1, ... > (B) start 4.0.0 > 4.0.0, 4.1.0, ... > (C) jump to some calendar based version number like 2022.2.9 > trunk based development has pros and cons...making a move like > > this > > irreversibly pledges trunk based development; and makes release > > branches > > hard to introduce > (X) somewhat orthogonal is to (also) use some suffixes > 4.0.0-alpha1, 4.0.0-alpha2, 4.0.0-beta1 > this is probably the most tempting to use - but this versioning > schema with a non-changing MINOR and PATCH number will > also suggest that the actual software is fully compatible - and > > only > > bugs are being fixed - which will not be true... > > I really like the idea to suffix these releases with alpha or beta > > - > > which > will communicate our level commitment that these are not 100% > > production > > ready artifacts. > > I think we could fix HIVE-25665; and probably experiment with > 4.0.0-alpha1 > for start... > > This also means there should *not* be a branch-4 after releasing > > Hive > > 4.0 > > and let that diverge (and becomes the next, super-ignored > > branch-3), > > correct; no need to keep a branch we don't maintain...but in any > > case > > I > > think we can postpone this decision until there will be something > > to > > release... :) > > cheers, > Zoltan > > > > On 2/9/22 10:23 AM, L?szl? Bodor wrote: > > Hi All! > > A purely technical question: what will the SNAPSHOT version become > > after > > releasing Hive 4.0.0? I think this is important, as it defines and > > reflects > > the future release plans. > > Currently, it's 4.0.0-SNAPSHOT, I guess it's since Hive 3.0 + > > branch-3. > > Hive is an evolving and super-active project: if we want to make > > regular > > releases, we should simply release Hive 4.0 and bump pom to > > 4.1.0-SNAPSHOT, > > which clearly says that we can release Hive 4.1 anytime we want, > > without > > being frustrated about "whether we included enough cool stuff to > > release > > 5.0". > > This also means there should *not* be a branch-4 after releasing > > Hive > > 4.0 > and let that diverge (and becomes the next, super-ignored > > branch-3), > > only > when we end up bringing a minor backward-incompatible thing that > > needs a > > 4.0.x, and when it happens, we'll create *branch-4.0 *on demand. > > For > > me, > > a > > branch called *branch-4.0* doesn't imply either I can expect cool > > releases > > in the future from there or the branch is maintained and tries to > > be > > in > > sync with the *master*. > > Regards, > Laszlo Bodor > > Alessandro Solimando <alessandro.solima...@gmail.com> ezt ?rta > > (id?pont: > > 2022. febr. 8., K, 16:42): > > Hello everyone, > thank you for starting this discussion. > > I agree that releasing the master branch regularly and > > sufficiently > > often > > is welcome and vital for the health of the community. > > It would be great to hear from others too, especially PMC members > > and > > committers, but even simple contributors/followers as myself. > > Best regards, > Alessandro > > On Wed, 2 Feb 2022 at 12:22, Stamatis Zampetakis < > > zabe...@gmail.com > > > wrote: > > Hello, > > Thanks for starting the discussion Zoltan. > > I strongly believe that it is important to have regular and > > often > > releases > > otherwise people will create and maintain separate Hive forks. > The latter is not good for the project and the community may > > lose > > valuable > > members because of it. > > Going forward I fully agree that there is no point bringing up > > strong > > blockers for the next release. For sure there are many backward > incompatible changes and possibly unstable features but unless > > we > > get > > a > release out it will be difficult to determine what is broken and > > what > > needs > > to be fixed. > > Due to the big number of changes that are going to appear in the > > next > > version I would suggest using the terms Hive X-alpha, Hive > > X-beta > > for > > the > > first few releases. This will make it clear to the end users > > that > > they > > need > > to be careful when upgrading from an older version and it will > > give us > > a > > bit more time and freedom to treat issues that the users will > > likely > > discover. > > The only real blocker that we may want to treat is HIVE-25665 > > [1] > > but > > we > > can continue the discussion under that ticket and re-evaluate if > > necessary, > > > Best, > Stamatis > > [1] https://issues.apache.org/jira/browse/HIVE-25665 > > > On Tue, Feb 1, 2022 at 5:03 PM Zoltan Haindrich <k...@rxd.hu> > > wrote: > > > Hey All, > > We didn't made a release for a long time now; (3.1.2 was > > released > > on > > 26 > > August 2019) - and I think because we didn't made that many > > branch-3 > > releases; not too many fixes > were ported there - which made that release branch kinda erode > > away. > > > We have a lot of new features/changes in the current master. > I think instead of aiming for big feature-packed releases we > > should > > aim > > for making a regular release every few months - we should make > regular > releases which people could > install and use. > After all releasing Hive after more than 2 years would be big > > step > > forward > > in itself alone - we have so many improvements that I can't > > even > > count... > > > But I may know not every aspects of the project / states of > > some > > internal > > features - so I would like to ask you: > What would be the bare minimum requirements before we could > > release > > the > > current master as Hive X? > > There are many nice-to-have-s like: > * hadoop upgrade > * jdk11 > * remove HoS or MR > * ? > but I don't think these are blockers...we can make any of these > > in > > the > next release if we start making them... > > cheers, > Zoltan > > > > > > > > > > > > > > >