-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71995/
-----------------------------------------------------------
(Updated Jan. 29, 2020, 2:23 p.m.)
Review request for hive, Gopal V, Jesús Camacho Rodríguez, and Krisztian Kasa.
Bugs: HIVE-22726
https://issues.apache.org/jira/browse/HIVE-22726
Repository: hive-git
Description
-------
The TopN key optimizer currently uses a priority queue for keeping track of the
largest/smallest rows. Its max size is the same as the user specified limit.
This should be replaced a more cache line friendly array with a small (128)
maximum size and see how much performance is gained.
Diffs (updated)
-----
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e3ee06ab5fa
ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 4998766f064
ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java 0ccaeea1da5
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java
5faa038c18d
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperBatch.java
0786c82b7be
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneralComparator.java
8cb48473785
ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java
a9ff6b4a830
ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ff815434f0c
ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java fce850f4fc2
Diff: https://reviews.apache.org/r/71995/diff/4/
Changes: https://reviews.apache.org/r/71995/diff/3-4/
Testing
-------
with the following query:
use tpcds_bin_partitioned_orc_100;
set hive.optimize.topnkey=true;
set hive.optimize.topnkey.max=5;
select i_item_id,
s_state, grouping(s_state) g_state,
avg(ss_quantity) agg1,
avg(ss_list_price) agg2,
avg(ss_coupon_amt) agg3,
avg(ss_sales_price) agg4
from store_sales, customer_demographics, date_dim, store, item
where ss_sold_date_sk = d_date_sk and
ss_item_sk = i_item_sk and
ss_store_sk = s_store_sk and
ss_cdemo_sk = cd_demo_sk
group by rollup (i_item_id, s_state)
order by i_item_id
,s_state
limit 5;
Results:
enabled: 5 rows selected (715.26 seconds)
enabled: 5 rows selected (605.888 seconds)
disabled: 5 rows selected (1208.168 seconds)
disabled: 5 rows selected (1219.482 seconds)
Thanks,
Attila Magyar