Xuefu Zhang created HIVE-8700: --------------------------------- Summary: Replace ReduceSink to HashTableSink (or equi.) for small tables [Spark Branch] Key: HIVE-8700 URL: https://issues.apache.org/jira/browse/HIVE-8700 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Szehon Ho
With HIVE-8616 enabled, the new plan has ReduceSinkOperator for the small tables. For example, the follow represents the operator plan for the small table dec1 derived from query {code}explain select /*+ MAPJOIN(dec)*/ * from dec join dec1 on dec.value=dec1.d;{code} {code} Map 2 Map Operator Tree: TableScan alias: dec1 Statistics: Num rows: 0 Data size: 107 Basic stats: PARTIAL Column stats: NONE Filter Operator predicate: d is not null (type: boolean) Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Reduce Output Operator key expressions: d (type: decimal(5,2)) sort order: + Map-reduce partition columns: d (type: decimal(5,2)) Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE value expressions: i (type: int) {code} With the new design for broadcasting small tables, we need to convert the ReduceSinkOperator with HashTableSinkOperator or equivalent in the new plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)