[ https://issues.apache.org/jira/browse/HIVE-16852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gopal V updated HIVE-16852: --------------------------- Labels: performance tpcds (was: ) > PTF: RANK() re-evaluates order predicates on the reducer > -------------------------------------------------------- > > Key: HIVE-16852 > URL: https://issues.apache.org/jira/browse/HIVE-16852 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer > Affects Versions: 2.1.1, 3.0.0 > Reporter: Gopal V > Labels: performance, tpcds > > {code} > explain select ss_item_sk, rank() over(order by cast(ss_list_price as > decimal(38,10))) as r , ss_list_price from store_sales; > STAGE PLANS: > Stage: Stage-1 > Tez > DagId: root_20170608015140_7b0debb9-b14b-4150-b004-9743c6127392:3 > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > DagName: > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: store_sales > Statistics: Num rows: 28800426268 Data size: 450435120648 > Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: 0 (type: int), CAST( ss_list_price AS > decimal(38,10)) (type: decimal(38,10)) > sort order: ++ > Map-reduce partition columns: 0 (type: int) > Statistics: Num rows: 28800426268 Data size: 450435120648 > Basic stats: COMPLETE Column stats: COMPLETE > value expressions: ss_item_sk (type: bigint), > ss_list_price (type: double) > Execution mode: vectorized, llap > LLAP IO: all inputs > Reducer 2 > Execution mode: llap > Reduce Operator Tree: > Select Operator > expressions: VALUE._col1 (type: bigint), VALUE._col11 (type: > double) > outputColumnNames: _col1, _col11 > Statistics: Num rows: 28800426268 Data size: 8399352770616 > Basic stats: COMPLETE Column stats: COMPLETE > PTF Operator > Function definitions: > Input definition > input alias: ptf_0 > output shape: _col1: bigint, _col11: double > type: WINDOWING > Windowing table definition > input alias: ptf_1 > name: windowingtablefunction > order by: CAST( _col11 AS decimal(38,10)) ASC NULLS > FIRST > partition by: 0 > raw input shape: > window functions: > window function definition > alias: rank_window_0 > arguments: CAST( _col11 AS decimal(38,10)) > name: rank > window function: GenericUDAFRankEvaluator > window frame: PRECEDING(MAX)~FOLLOWING(MAX) > isPivotResult: true > Statistics: Num rows: 28800426268 Data size: 8399352770616 > Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col1 (type: bigint), rank_window_0 (type: > int), _col11 (type: double) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 28800426268 Data size: 565636825720 > Basic stats: COMPLETE Column stats: COMPLETE > File Output Operator > compressed: false > Statistics: Num rows: 28800426268 Data size: > 565636825720 Basic stats: COMPLETE Column stats: COMPLETE > table: > input format: > org.apache.hadoop.mapred.SequenceFileInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > {code} > This forces the Decimal cast to be evaluated ~2x - once to produce the KEY > expression and once within the window function. -- This message was sent by Atlassian JIRA (v6.3.15#6346)