-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71995/
-----------------------------------------------------------
Review request for hive, Gopal V, Jesús Camacho Rodríguez, and Krisztian Kasa.
Bugs: HIVE-22726
https://issues.apache.org/jira/browse/HIVE-22726
Repository: hive-git
Description
-------
The TopN key optimizer currently uses a priority queue for keeping track of the
largest/smallest rows. Its max size is the same as the user specified limit.
This should be replaced a more cache line friendly array with a small (128)
maximum size and see how much performance is gained.
Diffs
-----
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e7724f9084f
ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java 4998766f064
ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java b7c12502204
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java
5faa038c18d
ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java
ce6efa49192
ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ff815434f0c
Diff: https://reviews.apache.org/r/71995/diff/1/
Testing
-------
with the following query:
use tpcds_bin_partitioned_orc_100;
set hive.optimize.topnkey=true;
set hive.optimize.topnkey.max=5;
select i_item_id,
s_state, grouping(s_state) g_state,
avg(ss_quantity) agg1,
avg(ss_list_price) agg2,
avg(ss_coupon_amt) agg3,
avg(ss_sales_price) agg4
from store_sales, customer_demographics, date_dim, store, item
where ss_sold_date_sk = d_date_sk and
ss_item_sk = i_item_sk and
ss_store_sk = s_store_sk and
ss_cdemo_sk = cd_demo_sk
group by rollup (i_item_id, s_state)
order by i_item_id
,s_state
limit 5;
Results:
enabled: 5 rows selected (715.26 seconds)
enabled: 5 rows selected (605.888 seconds)
disabled: 5 rows selected (1208.168 seconds)
disabled: 5 rows selected (1219.482 seconds)
Thanks,
Attila Magyar