[ https://issues.apache.org/jira/browse/HIVE-22489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Krisztian Kasa updated HIVE-22489: ---------------------------------- Attachment: HIVE-22489.8.patch > Reduce Sink operator should order nulls by parameter > ----------------------------------------------------- > > Key: HIVE-22489 > URL: https://issues.apache.org/jira/browse/HIVE-22489 > Project: Hive > Issue Type: Bug > Components: Query Planning > Reporter: Krisztian Kasa > Assignee: Krisztian Kasa > Priority: Major > Attachments: HIVE-22489.1.patch, HIVE-22489.2.patch, > HIVE-22489.3.patch, HIVE-22489.3.patch, HIVE-22489.4.patch, > HIVE-22489.5.patch, HIVE-22489.6.patch, HIVE-22489.7.patch, HIVE-22489.8.patch > > > When the property hive.default.nulls.last is set to true and no null order is > explicitly specified in the ORDER BY clause of the query null ordering should > be NULLS LAST. > But some of the Reduce Sink operators still orders null first. > {code} > SET hive.default.nulls.last=true; > EXPLAIN EXTENDED > SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key = > src2.key) ORDER BY src1.key LIMIT 5; > {code} > {code} > PREHOOK: query: EXPLAIN EXTENDED > SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key = > src2.key) ORDER BY src1.key > PREHOOK: type: QUERY > PREHOOK: Input: default@src > #### A masked pattern was here #### > POSTHOOK: query: EXPLAIN EXTENDED > SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key = > src2.key) ORDER BY src1.key > POSTHOOK: type: QUERY > POSTHOOK: Input: default@src > #### A masked pattern was here #### > OPTIMIZED SQL: SELECT `t0`.`key`, `t2`.`value` > FROM (SELECT `key` > FROM `default`.`src` > WHERE `key` IS NOT NULL) AS `t0` > INNER JOIN (SELECT `key`, `value` > FROM `default`.`src` > WHERE `key` IS NOT NULL) AS `t2` ON `t0`.`key` = `t2`.`key` > ORDER BY `t0`.`key` > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > #### A masked pattern was here #### > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE) > Reducer 3 <- Reducer 2 (SIMPLE_EDGE) > #### A masked pattern was here #### > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: src1 > filterExpr: key is not null (type: boolean) > Statistics: Num rows: 500 Data size: 43500 Basic stats: > COMPLETE Column stats: COMPLETE > GatherStats: false > Filter Operator > isSamplingPred: false > predicate: key is not null (type: boolean) > Statistics: Num rows: 500 Data size: 43500 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: key (type: string) > outputColumnNames: _col0 > Statistics: Num rows: 500 Data size: 43500 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: string) > null sort order: a > sort order: + > Map-reduce partition columns: _col0 (type: string) > Statistics: Num rows: 500 Data size: 43500 Basic > stats: COMPLETE Column stats: COMPLETE > tag: 0 > auto parallelism: true > Execution mode: vectorized, llap > LLAP IO: no inputs > Path -> Alias: > #### A masked pattern was here #### > Path -> Partition: > #### A masked pattern was here #### > Partition > base file name: src > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > properties: > COLUMN_STATS_ACCURATE > {"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}} > bucket_count -1 > bucketing_version 2 > column.name.delimiter , > columns key,value > columns.comments 'default','default' > columns.types string:string > #### A masked pattern was here #### > name default.src > numFiles 1 > numRows 500 > rawDataSize 5312 > serialization.ddl struct src { string key, string value} > serialization.format 1 > serialization.lib > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > totalSize 5812 > #### A masked pattern was here #### > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > properties: > COLUMN_STATS_ACCURATE > {"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}} > bucket_count -1 > bucketing_version 2 > column.name.delimiter , > columns key,value > columns.comments 'default','default' > columns.types string:string > #### A masked pattern was here #### > name default.src > numFiles 1 > numRows 500 > rawDataSize 5312 > serialization.ddl struct src { string key, string value} > serialization.format 1 > serialization.lib > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > totalSize 5812 > #### A masked pattern was here #### > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: default.src > name: default.src > Truncated Path -> Alias: > /src [src1] > Map 4 > Map Operator Tree: > TableScan > alias: src2 > filterExpr: key is not null (type: boolean) > Statistics: Num rows: 500 Data size: 89000 Basic stats: > COMPLETE Column stats: COMPLETE > GatherStats: false > Filter Operator > isSamplingPred: false > predicate: key is not null (type: boolean) > Statistics: Num rows: 500 Data size: 89000 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: key (type: string), value (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 500 Data size: 89000 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: string) > null sort order: a > sort order: + > Map-reduce partition columns: _col0 (type: string) > Statistics: Num rows: 500 Data size: 89000 Basic > stats: COMPLETE Column stats: COMPLETE > tag: 1 > value expressions: _col1 (type: string) > auto parallelism: true > Execution mode: vectorized, llap > LLAP IO: no inputs > Path -> Alias: > #### A masked pattern was here #### > Path -> Partition: > #### A masked pattern was here #### > Partition > base file name: src > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > properties: > COLUMN_STATS_ACCURATE > {"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}} > bucket_count -1 > bucketing_version 2 > column.name.delimiter , > columns key,value > columns.comments 'default','default' > columns.types string:string > #### A masked pattern was here #### > name default.src > numFiles 1 > numRows 500 > rawDataSize 5312 > serialization.ddl struct src { string key, string value} > serialization.format 1 > serialization.lib > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > totalSize 5812 > #### A masked pattern was here #### > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > properties: > COLUMN_STATS_ACCURATE > {"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}} > bucket_count -1 > bucketing_version 2 > column.name.delimiter , > columns key,value > columns.comments 'default','default' > columns.types string:string > #### A masked pattern was here #### > name default.src > numFiles 1 > numRows 500 > rawDataSize 5312 > serialization.ddl struct src { string key, string value} > serialization.format 1 > serialization.lib > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > totalSize 5812 > #### A masked pattern was here #### > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: default.src > name: default.src > Truncated Path -> Alias: > /src [src2] > Reducer 2 > Execution mode: llap > Needs Tagging: false > Reduce Operator Tree: > Merge Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col0 (type: string) > 1 _col0 (type: string) > outputColumnNames: _col0, _col2 > Position of Big Table: 1 > Statistics: Num rows: 791 Data size: 140798 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col0 (type: string), _col2 (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 791 Data size: 140798 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: string) > null sort order: z > sort order: + > Statistics: Num rows: 791 Data size: 140798 Basic stats: > COMPLETE Column stats: COMPLETE > tag: -1 > value expressions: _col1 (type: string) > auto parallelism: false > Reducer 3 > Execution mode: vectorized, llap > Needs Tagging: false > Reduce Operator Tree: > Select Operator > expressions: KEY.reducesinkkey0 (type: string), VALUE._col0 > (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 791 Data size: 140798 Basic stats: > COMPLETE Column stats: COMPLETE > File Output Operator > compressed: false > GlobalTableId: 0 > #### A masked pattern was here #### > NumFilesPerFileSink: 1 > Statistics: Num rows: 791 Data size: 140798 Basic stats: > COMPLETE Column stats: COMPLETE > #### A masked pattern was here #### > table: > input format: > org.apache.hadoop.mapred.SequenceFileInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > properties: > columns _col0,_col1 > columns.types string:string > escape.delim \ > hive.serialization.extend.additional.nesting.levels > true > serialization.escape.crlf true > serialization.format 1 > serialization.lib > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > TotalFiles: 1 > GatherStats: false > MultiFileSpray: false > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)