[jira] [Created] (HIVE-28729) Apply nulls order setting in Reduce Sink operator of join branches

Krisztian Kasa (Jira) Tue, 28 Jan 2025 07:26:05 -0800

Krisztian Kasa created HIVE-28729:
-------------------------------------

             Summary: Apply nulls order setting in Reduce Sink operator of join 
branches
                 Key: HIVE-28729
                 URL: https://issues.apache.org/jira/browse/HIVE-28729
             Project: Hive
          Issue Type: Sub-task
            Reporter: Krisztian Kasa



{code:java}
set hive.default.nulls.last=false;

create table t1(key int, value string);

EXPLAIN SELECT sum(hash(a.key,a.value,b.key,b.value)) FROM t1 a INNER JOIN t1 b 
on a.key = b.key;
{code}
{code:java}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
#### A masked pattern was here ####
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
        Reducer 3 <- Reducer 2 (CUSTOM_SIMPLE_EDGE)
#### A masked pattern was here ####
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: a
                  filterExpr: key is not null (type: boolean)
                  Statistics: Num rows: 1 Data size: 188 Basic stats: COMPLETE 
Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 1 Data size: 188 Basic stats: 
COMPLETE Column stats: NONE
                    Select Operator
                      expressions: key (type: int), value (type: string)
                      outputColumnNames: key, value
                      Statistics: Num rows: 1 Data size: 188 Basic stats: 
COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: key (type: int)
                        null sort order: z
                        sort order: +
                        Map-reduce partition columns: key (type: int)
                        Statistics: Num rows: 1 Data size: 188 Basic stats: 
COMPLETE Column stats: NONE
                        value expressions: value (type: string)
            Execution mode: vectorized, llap
            LLAP IO: all inputs
        Map 4 
            Map Operator Tree:
                TableScan
                  alias: b
                  filterExpr: key is not null (type: boolean)
                  Statistics: Num rows: 1 Data size: 188 Basic stats: COMPLETE 
Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 1 Data size: 188 Basic stats: 
COMPLETE Column stats: NONE
                    Select Operator
                      expressions: key (type: int), value (type: string)
                      outputColumnNames: key, value
                      Statistics: Num rows: 1 Data size: 188 Basic stats: 
COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: key (type: int)
                        null sort order: z
                        sort order: +
                        Map-reduce partition columns: key (type: int)
                        Statistics: Num rows: 1 Data size: 188 Basic stats: 
COMPLETE Column stats: NONE
                        value expressions: value (type: string)
            Execution mode: vectorized, llap
            LLAP IO: all inputs
        Reducer 2 
            Execution mode: llap
            Reduce Operator Tree:
              Merge Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 key (type: int)
                  1 key (type: int)
                outputColumnNames: key, value, key0, value0
                Statistics: Num rows: 1 Data size: 206 Basic stats: COMPLETE 
Column stats: NONE
                Select Operator
                  expressions: hash(key,value,key0,value0) (type: int)
                  outputColumnNames: $f0
                  Statistics: Num rows: 1 Data size: 206 Basic stats: COMPLETE 
Column stats: NONE
                  Group By Operator
                    aggregations: sum($f0)
                    minReductionHashAggr: 0.99
                    mode: hash
                    outputColumnNames: _col0
                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
                    Reduce Output Operator
                      null sort order: 
                      sort order: 
                      Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: NONE
                      value expressions: _col0 (type: bigint)
        Reducer 3 
            Execution mode: vectorized, llap
            Reduce Operator Tree:
              Group By Operator
                aggregations: sum(VALUE._col0)
                mode: mergepartial
                outputColumnNames: $f0
                Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE 
Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE 
Column stats: NONE
                  table:
                      input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
{code}
Nulls order in RS operators are NULLS LAST but is should be NULLS FIRST because 
of the config {{hive.default.nulls.last=false}}
{code}
        Map 1 
            Map Operator Tree:
            ...
                       Reduce Output Operator
                        key expressions: key (type: int)
                        null sort order: z
            ...
{code}
{code}
        Map 4 
            Map Operator Tree:
            ...
                      Reduce Output Operator
                        key expressions: key (type: int)
                        null sort order: z
            ...
{code} 




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-28729) Apply nulls order setting in Reduce Sink operator of join branches

Reply via email to