Re: Start releasing the master branch

Peter Vary Mon, 21 Mar 2022 07:59:09 -0700

Hi Team,

If everyone agrees, tomorrow I would like to start the  release process for 
4.0.0-alpha-1.


Is there any outstanding blocker jira that you know of?

Thanks,
Peter


> On 2022. Mar 9., at 17:01, Stamatis Zampetakis <[email protected]> wrote:
> 
> I just logged HIVE-26022 [1] which seems to be another potential blocker
> for 4.0.0-alpha-1.
> 
> Best,
> Stamatis
> 
> [1] https://issues.apache.org/jira/browse/HIVE-26022
> 
> On Thu, Mar 3, 2022 at 3:54 PM Peter Vary <[email protected]> wrote:
> 
>> Hi Team,
>> 
>> Here is our status:
>> We collected the blocker tickets and marked them with fixVersion
>> 4.0.0-alpha-1:
>> 
>> https://issues.apache.org/jira/issues/?filter=-1&jql=project%20%3D%20HIVE%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%204.0.0-alpha-1
>> <https://issues.apache.org/jira/issues/?filter=-1&jql=project%20=%20HIVE%20AND%20resolution%20=%20Unresolved%20AND%20fixVersion%20=%204.0.0-alpha-1>
>> 
>>   - HIVE-26002 - Create db scripts for 4.0.0-alpha-1
>>   - HIVE-25994 - Analyze table runs into ClassNotFoundException-s in
>>   case binary distribution is used
>>   - HIVE-25935 - Cleanup IMetaStoreClient#getPartitionsByNames APIs
>> 
>> Please create a jira and mark it with fixVersion 4.0.0-alpha-1, if you
>> happen to know of any other blockers.
>> 
>> We plan to fix these jiras, and then release the following artifacts
>> together:
>> 
>>   - Storage API - 4.0.0-alpha-1
>>   - Standalone Metastore - 4.0.0-alpha-1
>>   - Hive - 4.0.0-alpha-1
>> 
>> 
>> Thanks,
>> Peter
>> 
>> 
>> On 2022. Mar 2., at 11:50, Peter Vary <[email protected]> wrote:
>> 
>> Will continue this discussion on the #hive ASF slack. If you are
>> interested, please join.
>> We will do updates here time-to-time, so the ones who are not using slack
>> can participate that way.
>> 
>> On 2022. Mar 2., at 11:11, Peter Vary <[email protected]> wrote:
>> 
>> Good idea Zoltan, joined the channel.
>> I would like to scope reasonably small, so I agree with focusing on
>> 4.0.0-alpha-1
>> 
>> On 2022. Mar 2., at 11:01, Zoltan Haindrich <[email protected]> wrote:
>> 
>> Hey,
>> 
>> regarding 4.0.0 / 4.0.0-alpha-1 target/fix versions in the jira:
>> * I think we should change all already resolved tickets with fix version
>> 4.0.0 to have fix version 4.0.0-alpha-1
>> ** this could be postponed until we are actually releasing the thing as I
>> think everyone committing to the master is entering 4.0.0 as fix version
>> without much aftertought...this could probably change after we get the
>> first release out.
>> * regarding the the existing tickets with fix version/target version 4.0.0
>> - I think that would be a bit too much (>200 tickets)
>> ** some numbers:
>> *** 239 tickets open now
>> *** 224 was not updated in the last 90 days
>> *** 216 was not changed in the last 180 days
>> *** 178 was not updated in the last 360 days
>> ** as a matter of fact I think many of these tickets shouldn't even have a
>> target or fix version - and most of them should be unassigned...I don't
>> want to get lost in this right now...I think for now we should keep the
>> scope small and only care with 4.0.0-alpha-1 tickets
>> 
>> https://issues.apache.org/jira/issues/?
>> jql=project%20%3D%20hive%20and%20resolutiondate%20%20is%20empty%20and%20(fixVersion%20%20in%20(%274.0.0%27)%20or%20cf%5B12310320%5D%20%20in%20(%274.0.0%27))
>> 
>> I think for faster communication regarding these things we could also
>> utilize the #hive channel on the ASF slack - what do you guys think?
>> 
>> cheers,
>> Zoltan
>> 
>> On 3/2/22 9:51 AM, Stamatis Zampetakis wrote:
>> 
>> Agree with Peter, creating JIRAs is the way to go.
>> Putting the appropriate priority (e.g., BLOCKER) and version (4.0.0 or
>> 4.0.0-alpha-1) when creating the JIRA should be enough to keep us on track.
>> I am mentioning both 4.0.0 and 4.0.0-alpha-1 because eventually I think we
>> are gonna move everything with target 4.0.0 to 4.0.0-alpha-1.
>> On Wed, Mar 2, 2022 at 9:37 AM Peter Vary <[email protected]>
>> wrote:
>> 
>> Hi Team,
>> 
>> Could we create tickets for the issues?
>> I think it would be good to collect the issues/potential blockers in the
>> jira instead of having a complicated mail thread.
>> 
>> If we set the target version to 4.0.0-alpha-1, then we can easily use the
>> following filter to see the status of the tasks:
>> 
>> 
>> https://issues.apache.org/jira/issues/?jql=project%3D%22HIVE%22%20AND%20%22Target%20Version%2Fs%22%3D%224.0.0-alpha-1%22
>> <
>> 
>> https://issues.apache.org/jira/issues/?jql=project=%22HIVE%22%20AND%20%22Target%20Version/s%22=%224.0.0-alpha-1%22
>> 
>> 
>> 
>> @Stamatis: Sadly I have missed your letter/jira and created my own with
>> the fix for building from the src package:
>> https://issues.apache.org/jira/browse/HIVE-25997 <
>> https://issues.apache.org/jira/browse/HIVE-25997>
>> If you have time, I would like to ask you to review.
>> 
>> If anyone knows of any blocker I would like to ask them to create a jira
>> for that too.
>> 
>> Thanks,
>> Peter
>> 
>> 
>> On 2022. Mar 2., at 7:04, Sungwoo Park <[email protected]> wrote:
>> 
>> Hello Alessandro,
>> 
>> For the latest commit, loading ORC tables fails (with the log message
>> 
>> shown below). Let me try to find a commit that introduces this bug and
>> create a JIRA ticket.
>> 
>> 
>> --- Sungwoo
>> 
>> 2022-03-02 05:41:56,578 ERROR [Thread-73] exec.StatsTask: Failed to run
>> 
>> stats task
>> 
>> java.io.IOException: org.apache.hadoop.mapred.InvalidInputException:
>> 
>> Input path does not exist:
>> 
>> hdfs://blue0:8020/tmp/hive/gitlab-runner/a236e1b4-b354-4343-b900-3d71b1bc7504/hive_2022-03-02_05-40-50_966_446574755576325031-1/-mr-10000/.hive-staging_hive_2022-03-02_05-40-50_966_446574755576325031-1/-ext-10001
>> 
>> at
>> 
>> 
>> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:622)
>> 
>> at
>> 
>> 
>> org.apache.hadoop.hive.ql.stats.ColStatsProcessor.constructColumnStatsFromPackedRows(ColStatsProcessor.java:105)
>> 
>> at
>> 
>> 
>> org.apache.hadoop.hive.ql.stats.ColStatsProcessor.persistColumnStats(ColStatsProcessor.java:200)
>> 
>> at
>> 
>> 
>> org.apache.hadoop.hive.ql.stats.ColStatsProcessor.process(ColStatsProcessor.java:93)
>> 
>> at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:107)
>> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
>> at
>> 
>> 
>> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>> 
>> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:83)
>> Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path
>> 
>> does not exist:
>> 
>> hdfs://blue0:8020/tmp/hive/gitlab-runner/a236e1b4-b354-4343-b900-3d71b1bc7504/hive_2022-03-02_05-40-50_966_446574755576325031-1/-mr-10000/.hive-staging_hive_2022-03-02_05-40-50_966_446574755576325031-1/-ext-10001
>> 
>> at
>> 
>> 
>> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:294)
>> 
>> at
>> 
>> 
>> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:236)
>> 
>> at
>> 
>> 
>> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45)
>> 
>> at
>> 
>> 
>> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:322)
>> 
>> at
>> 
>> 
>> org.apache.hadoop.hive.ql.exec.FetchOperator.generateWrappedSplits(FetchOperator.java:435)
>> 
>> at
>> 
>> 
>> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:402)
>> 
>> at
>> 
>> 
>> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:306)
>> 
>> at
>> 
>> 
>> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:560)
>> 
>> ... 7 more
>> 
>> On Tue, 1 Mar 2022, Alessandro Solimando wrote:
>> 
>> Hi Sungwoo,
>> last time I tried to run TPCDS-based benchmark I stumbled upon a similar
>> situation, finally I found that statistics were not computed, so CBO was
>> not kicking in, and the automatic retry goes with CBO off which was
>> 
>> failing
>> 
>> for something like 10 queries (subqueries cannot be decorrelated, but
>> 
>> also
>> 
>> some runtime errors).
>> 
>> Making sure that (column) statistics were correctly computed fixed the
>> problem.
>> 
>> Can you check if this is the case for you?
>> 
>> HTH,
>> Alessandro
>> 
>> On Tue, 1 Mar 2022 at 15:28, POSTECH CT <[email protected]> wrote:
>> 
>> Hello Hive team,
>> 
>> I wonder if anyone in the Hive team has tried the TPC-DS benchmark on
>> the master branch recently.  We occasionally run TPC-DS system tests
>> using the master branch, and the tests don't succeed completely. Here
>> is how our TPC-DS tests proceed.
>> 
>> 1. Compile and run Hive on Tez (not Hive-LLAP)
>> 2. Load ORC tables from 1TB TPC-DS raw text data, and compute
>> 
>> statistics
>> 
>> 3. Run 99 TPC-DS queries which were slightly modified to return
>> varying number of rows (rather than 100 rows)
>> 4. Compare the results against the previous results
>> 
>> The previous results were obtained and cross-checked by running Hive
>> 3.1.2 and SparkSQL 2.3/3.2, so we are faily confident about their
>> correctness.
>> 
>> For the latest commit in the master branch, step 2 fails. For earlier
>> commits (for example, commits in February 2021), step 3 fails where
>> several queries either fail or return wrong results.
>> 
>> We can compile and report the test results in this mailing list, but
>> would like to know if similar results have been reproduced by the Hive
>> team, in order to make sure that we did not make errors in our tests.
>> 
>> If it is okay to open a JIRA ticket that only reports failures in the
>> TPC-DS test, we could also perform git bi-sect to locate the commit
>> that begin to generate wrong results.
>> 
>> --- Sungwoo Park
>> 
>> On Tue, 1 Mar 2022, Zoltan Haindrich wrote:
>> 
>> Hey,
>> 
>> Great to hear that we are on the same side regarding these things :)
>> 
>> For around a week now - we have nightly builds for the master branch:
>> http://ci.hive.apache.org/job/hive-nightly/12/
>> 
>> I think we have 1 blocker issue:
>> https://issues.apache.org/jira/browse/HIVE-25665
>> 
>> I know about one more thing I would rather get fixed before we release
>> 
>> it:
>> 
>> https://issues.apache.org/jira/browse/HIVE-25994
>> The best would be to introduce smoke tests (HIVE-22302) to ensure that
>> something like this will not happen in the future - but we should
>> 
>> probably
>> 
>> start moving forward.
>> 
>> I think we could call the first iteration of this as "4.0.0-alpha-1"
>> 
>> :)
>> 
>> 
>> I've added 4.0.0-alpha-1 as a version - and added the above two ticket
>> 
>> to it.
>> 
>> 
>> 
>> 
>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20HIVE%20AND%20fixVersion%20%3D%204.0.0-alpha-1
>> 
>> 
>> Are there any more things you guys know which would be needed?
>> 
>> cheers,
>> Zoltan
>> 
>> 
>> On 2/22/22 12:18 PM, Peter Vary wrote:
>> 
>> I would vote for 4.0.0-alpha-1 or similar for all of the components.
>> 
>> When we have more stable releases I would keep the 4.x.x schema,
>> 
>> since
>> 
>> everyone is familiar with it, and I do not see a really good reason
>> 
>> to
>> 
>> change it.
>> 
>> Thanks,
>> Peter
>> 
>> 
>> On 2022. Feb 10., at 3:34, Szehon Ho <[email protected]>
>> 
>> wrote:
>> 
>> 
>> +1 that would be awesome to see Hive master released after so long.
>> 
>> Either 4.0 or 4.0.0-alpha-1 makes sense to me, not sure how we would
>> 
>> pick
>> 
>> any 3.x or calendar date (which could tend to slip and be more
>> confusing?).
>> 
>> Thanks in any case to get the ball rolling.
>> Szehon
>> 
>> On Wed, Feb 9, 2022 at 4:55 AM Zoltan Haindrich <[email protected]>
>> 
>> wrote:
>> 
>> 
>> Hey,
>> 
>> Thank you guys for chiming in; versioning is for sure something we
>> 
>> should
>> 
>> get to some common ground.
>> Its a triple problem right now; I think we have the following
>> 
>> things:
>> 
>> * storage-api
>> ** we have "2.7.3-SNAPSHOT" in the repo
>> ***
>> 
>> 
>> 
>> https://github.com/apache/hive/blob/0d1cffffc7c5005fe47759298fb35a1c67edc93f/storage-api/pom.xml#L27
>> 
>> ** meanwhile we already have 2.8.1 released to maven central
>> ***
>> 
>> https://mvnrepository.com/artifact/org.apache.hive/hive-storage-api
>> 
>> * standalone-metastore
>> ** 4.0.0-SNAPSHOT in the repo
>> ** last release is 3.1.2
>> * hive
>> ** 4.0.0-SNAPSHOT in the repo
>> ** last release is 3.1.2
>> 
>> Regarding the actual version number I'm not entirely sure where we
>> 
>> should
>> 
>> start the numbering - that's why I was referring to it as Hive-X
>> 
>> in my
>> 
>> first letter.
>> 
>> I think the key point here would be to start shipping releases
>> 
>> regularily
>> 
>> and not the actual version number we will use - I'll kinda open to
>> 
>> any
>> 
>> versioning scheme which
>> reflects that this is a newer release than 3.1.2.
>> 
>> I could imagine the following ones:
>> (A) start with something less expected; but keep 3 in the prefix to
>> reflect that this is not yet 4.0
>>  I can imagine the following numbers:
>>  3.900.0, 3.901.0, ...
>>  3.9.0, 3.9.1, ...
>> (B) start 4.0.0
>>  4.0.0, 4.1.0, ...
>> (C) jump to some calendar based version number like 2022.2.9
>>  trunk based development has pros and cons...making a move like
>> 
>> this
>> 
>> irreversibly pledges trunk based development; and makes release
>> 
>> branches
>> 
>> hard to introduce
>> (X) somewhat orthogonal is to (also) use some suffixes
>>  4.0.0-alpha1, 4.0.0-alpha2, 4.0.0-beta1
>>  this is probably the most tempting to use - but this versioning
>> schema with a non-changing MINOR and PATCH number will
>>  also suggest that the actual software is fully compatible - and
>> 
>> only
>> 
>> bugs are being fixed - which will not be true...
>> 
>> I really like the idea to suffix these releases with alpha or beta
>> 
>> -
>> 
>> which
>> will communicate our level commitment that these are not 100%
>> 
>> production
>> 
>> ready artifacts.
>> 
>> I think we could fix HIVE-25665; and probably experiment with
>> 4.0.0-alpha1
>> for start...
>> 
>> This also means there should *not* be a branch-4 after releasing
>> 
>> Hive
>> 
>> 4.0
>> 
>> and let that diverge (and becomes the next, super-ignored
>> 
>> branch-3),
>> 
>> correct; no need to keep a branch we don't maintain...but in any
>> 
>> case
>> 
>> I
>> 
>> think we can postpone this decision until there will be something
>> 
>> to
>> 
>> release... :)
>> 
>> cheers,
>> Zoltan
>> 
>> 
>> 
>> On 2/9/22 10:23 AM, L?szl? Bodor wrote:
>> 
>> Hi All!
>> 
>> A purely technical question: what will the SNAPSHOT version become
>> 
>> after
>> 
>> releasing Hive 4.0.0? I think this is important, as it defines and
>> 
>> reflects
>> 
>> the future release plans.
>> 
>> Currently, it's 4.0.0-SNAPSHOT, I guess it's since Hive 3.0 +
>> 
>> branch-3.
>> 
>> Hive is an evolving and super-active project: if we want to make
>> 
>> regular
>> 
>> releases, we should simply release Hive 4.0 and bump pom to
>> 
>> 4.1.0-SNAPSHOT,
>> 
>> which clearly says that we can release Hive 4.1 anytime we want,
>> 
>> without
>> 
>> being frustrated about "whether we included enough cool stuff to
>> 
>> release
>> 
>> 5.0".
>> 
>> This also means there should *not* be a branch-4 after releasing
>> 
>> Hive
>> 
>> 4.0
>> and let that diverge (and becomes the next, super-ignored
>> 
>> branch-3),
>> 
>> only
>> when we end up bringing a minor backward-incompatible thing that
>> 
>> needs a
>> 
>> 4.0.x, and when it happens, we'll create *branch-4.0 *on demand.
>> 
>> For
>> 
>> me,
>> 
>> a
>> 
>> branch called *branch-4.0* doesn't imply either I can expect cool
>> 
>> releases
>> 
>> in the future from there or the branch is maintained and tries to
>> 
>> be
>> 
>> in
>> 
>> sync with the *master*.
>> 
>> Regards,
>> Laszlo Bodor
>> 
>> Alessandro Solimando <[email protected]> ezt ?rta
>> 
>> (id?pont:
>> 
>> 2022. febr. 8., K, 16:42):
>> 
>> Hello everyone,
>> thank you for starting this discussion.
>> 
>> I agree that releasing the master branch regularly and
>> 
>> sufficiently
>> 
>> often
>> 
>> is welcome and vital for the health of the community.
>> 
>> It would be great to hear from others too, especially PMC members
>> 
>> and
>> 
>> committers, but even simple contributors/followers as myself.
>> 
>> Best regards,
>> Alessandro
>> 
>> On Wed, 2 Feb 2022 at 12:22, Stamatis Zampetakis <
>> 
>> [email protected]
>> 
>> 
>> wrote:
>> 
>> Hello,
>> 
>> Thanks for starting the discussion Zoltan.
>> 
>> I strongly believe that it is important to have regular and
>> 
>> often
>> 
>> releases
>> 
>> otherwise people will create and maintain separate Hive forks.
>> The latter is not good for the project and the community may
>> 
>> lose
>> 
>> valuable
>> 
>> members because of it.
>> 
>> Going forward I fully agree that there is no point bringing up
>> 
>> strong
>> 
>> blockers for the next release. For sure there are many backward
>> incompatible changes and possibly unstable features but unless
>> 
>> we
>> 
>> get
>> 
>> a
>> release out it will be difficult to determine what is broken and
>> 
>> what
>> 
>> needs
>> 
>> to be fixed.
>> 
>> Due to the big number of changes that are going to appear in the
>> 
>> next
>> 
>> version I would suggest using the terms Hive X-alpha, Hive
>> 
>> X-beta
>> 
>> for
>> 
>> the
>> 
>> first few releases. This will make it clear to the end users
>> 
>> that
>> 
>> they
>> 
>> need
>> 
>> to be careful when upgrading from an older version and it will
>> 
>> give us
>> 
>> a
>> 
>> bit more time and freedom to treat issues that the users will
>> 
>> likely
>> 
>> discover.
>> 
>> The only real blocker that we may want to treat is HIVE-25665
>> 
>> [1]
>> 
>> but
>> 
>> we
>> 
>> can continue the discussion under that ticket and re-evaluate if
>> 
>> necessary,
>> 
>> 
>> Best,
>> Stamatis
>> 
>> [1] https://issues.apache.org/jira/browse/HIVE-25665
>> 
>> 
>> On Tue, Feb 1, 2022 at 5:03 PM Zoltan Haindrich <[email protected]>
>> 
>> wrote:
>> 
>> 
>> Hey All,
>> 
>> We didn't made a release for a long time now; (3.1.2 was
>> 
>> released
>> 
>> on
>> 
>> 26
>> 
>> August 2019) - and I think because we didn't made that many
>> 
>> branch-3
>> 
>> releases; not too many fixes
>> were ported there - which made that release branch kinda erode
>> 
>> away.
>> 
>> 
>> We have a lot of new features/changes in the current master.
>> I think instead of aiming for big feature-packed releases we
>> 
>> should
>> 
>> aim
>> 
>> for making a regular release every few months - we should make
>> regular
>> releases which people could
>> install and use.
>> After all releasing Hive after more than 2 years would be big
>> 
>> step
>> 
>> forward
>> 
>> in itself alone - we have so many improvements that I can't
>> 
>> even
>> 
>> count...
>> 
>> 
>> But I may know not every aspects of the project / states of
>> 
>> some
>> 
>> internal
>> 
>> features - so I would like to ask you:
>> What would be the bare minimum requirements before we could
>> 
>> release
>> 
>> the
>> 
>> current master as Hive X?
>> 
>> There are many nice-to-have-s like:
>> * hadoop upgrade
>> * jdk11
>> * remove HoS or MR
>> * ?
>> but I don't think these are blockers...we can make any of these
>> 
>> in
>> 
>> the
>> next release if we start making them...
>> 
>> cheers,
>> Zoltan
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>

Re: Start releasing the master branch

Reply via email to