For correctness, we can merge a few pull requests and change the default values of a few configuration parameters, so that we can get the correct results for the TPC-DS benchmark.

Another issue is a performance regression when compared with Hive 3.1. I ran the TPC-DS benchmark using a scale factor of 10TB. Our internal testing shows that the current snapshot of Hive 4 is 1.5 times slower than Hive 3.1. Here is a summary of our internal testing on a cluster with 13 nodes, each with 256GB memory and 6 SSDs.

Systems compared:

1. Trino 417 (using Java 11)
2. Hive 3.1 (a fork maintained by us)
3. Hive 4.0.0-SNAPSHOT (as of February 2023)

Results:

1. Trino 417
total execution time = 9633 seconds, geometric mean = 28.19 seconds
query 21 returns wrong results.
query 23 returns wrong results.
query 72 fails (with query.max-memory = 1440GB)

2. Hive 3.1
total execution time = 9900 seconds, geometric mean = 31.67 seconds All the 99 queries return correct results.

3. Hive 4.0.0-SNAPSHOT
total execution time = 10584 seconds, geometric mean = 43.72 seconds
All the 99 queries return correct results.

Around the summer 2020, Hive 4.0.0-SNAPSHOT was noticeably faster than Hive 3.1, although a few queries returned wrong results.

Not sure about how to fix the performance regression. Git bisecting is not a practical option because 1) until last year, building 4.0.0-SNAPSHOT was not smooth because of Tez dependency; 2) loadig 10TB TPC-DS data for each commit is too much an overhead.

I am thinking about comparing DAG plans from Hive 3.1 and 4.0.0-SNAPSHOT for those queries that exhibit performance regression. If you have any suggestion, please let me know.

--- Sungwoo

On Tue, 21 Mar 2023, Stamatis Zampetakis wrote:

Many thanks for running tests with 4.0.0 Sungwoo; it is invaluable
help for getting out a stable Hive 4.

I will review https://issues.apache.org/jira/browse/HIVE-26968 in the
coming weeks; I have assigned myself as reviewer in the PR.

Can some other people (committers or not) help in reviewing the
remaining TPC-DS blockers for which we have a PR?

Reminder: Good non-binding reviews are important and much appreciated
by the community. They are also among the important metrics for
becoming a Hive committer/PMC [1].

Best,
Stamatis

[1] https://cwiki.apache.org/confluence/display/Hive/BecomingACommitter

On Tue, Mar 14, 2023 at 12:07?PM Sungwoo Park <c...@pl.postech.ac.kr> wrote:

Hello,

I would like to expand the list of blockers with HIVE-27138 [1] which fixes NPE
on mapjoin_filter_on_outerjoin.q.

Currently mapjoin_filter_on_outerjoin.q is tested with MapReduce execution
engine and shows no problem. However, it shows a few problems when tested with
Tez execution engine. HIVE-27138 is the first fix found after analyzing
mapjoin_filter_on_outerjoin.q, and Seonggon will create a couple more tickets
later.

In the meanwhile, it would be great if someone could review pull requests for
subtasks in HIVE-26654. (I moved to HIVE-26654 three tickets that I previously
requested code review for.)

Best,

--- Sungwoo
  [1] https://issues.apache.org/jira/browse/HIVE-27138

On Fri, 10 Mar 2023, Stamatis Zampetakis wrote:

Hi Kirti,

Thanks for bringing up this topic.

The master branch already has many new features; we don't need to wait for
more to cut a GA.

The main criterion for going GA is stability thus I would consider
regressions as the only blockers for the release.

If I recall well the only regressions discovered so far are some problems
with TPC-DS queries so basically HIVE-26654 [1].

I will let others chime in to include more tickets if necessary.

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-26654


On Wed, Mar 8, 2023 at 10:02?AM Kirti Ruge <kirtirug...@gmail.com> wrote:

Hello Hive Dev,

It has been about 6 months since Hive-4.0-alpha-2 was released in Nov 2022.
Would it be a good time to discuss about HIVE-4.0 GA  release to the
community ? Can we have discussion on the new features/jdk support versions
which we want to publish as part of 4.0 GA , timeframe of release.


Thanks,
Kirti


Reply via email to