[ https://issues.apache.org/jira/browse/HIVE-21539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vineet Garg updated HIVE-21539: ------------------------------- Status: Patch Available (was: Open) > GroupBy + where clause on same column results in incorrect query rewrite > ------------------------------------------------------------------------ > > Key: HIVE-21539 > URL: https://issues.apache.org/jira/browse/HIVE-21539 > Project: Hive > Issue Type: Bug > Components: CBO > Affects Versions: 4.0.0 > Reporter: anishek > Assignee: Vineet Garg > Priority: Major > Attachments: HIVE-21539.1.patch, HIVE-21539.2.patch, > HIVE-21539.3.patch > > > {code} > create table a (i int, j string); > insert into a values ( 1, 'a'),(2,'b'); > explain extended select min(j) from a where j='a' group by j; > +----------------------------------------------------+ > | Explain | > +----------------------------------------------------+ > | OPTIMIZED SQL: SELECT MIN(TRUE) AS `_o__c0` | > | FROM `default`.`a` | > | WHERE `j` = 'a' | > | GROUP BY TRUE | > | STAGE DEPENDENCIES: | > | Stage-1 is a root stage | > | Stage-0 depends on stages: Stage-1 | > | | > | STAGE PLANS: | > | Stage: Stage-1 | > | Tez | > | DagId: > anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 | > | Edges: | > | Reducer 2 <- Map 1 (SIMPLE_EDGE) | > | DagName: > anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 | > | Vertices: | > | Map 1 | > | Map Operator Tree: | > | TableScan | > | alias: a | > | filterExpr: (j = 'a') (type: boolean) | > | Statistics: Num rows: 2 Data size: 170 Basic stats: > COMPLETE Column stats: COMPLETE | > | GatherStats: false | > | Filter Operator | > | isSamplingPred: false | > | predicate: (j = 'a') (type: boolean) | > | Statistics: Num rows: 1 Data size: 85 Basic stats: > COMPLETE Column stats: COMPLETE | > | Select Operator | > | Statistics: Num rows: 1 Data size: 85 Basic stats: > COMPLETE Column stats: COMPLETE | > | Group By Operator | > | aggregations: min(true) | > | keys: true (type: boolean) | > | mode: hash | > | outputColumnNames: _col0, _col1 | > | Statistics: Num rows: 1 Data size: 8 Basic stats: > COMPLETE Column stats: COMPLETE | > | Reduce Output Operator | > | key expressions: _col0 (type: boolean) | > | null sort order: a | > | sort order: + | > | Map-reduce partition columns: _col0 (type: > boolean) | > | Statistics: Num rows: 1 Data size: 8 Basic stats: > COMPLETE Column stats: COMPLETE | > | tag: -1 | > | value expressions: _col1 (type: boolean) | > | auto parallelism: true | > | Path -> Alias: | > | hdfs://localhost:9000/tmp/hive/warehouse/a [a] | > | Path -> Partition: | > | hdfs://localhost:9000/tmp/hive/warehouse/a | > | Partition | > | base file name: a | > | input format: org.apache.hadoop.mapred.TextInputFormat | > | output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | > | properties: | > | COLUMN_STATS_ACCURATE > {"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} | > | bucket_count -1 | > | bucketing_version 2 | > | column.name.delimiter , | > | columns i,j | > | columns.comments | > | columns.types int:string | > | file.inputformat > org.apache.hadoop.mapred.TextInputFormat | > | file.outputformat > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | > | location hdfs://localhost:9000/tmp/hive/warehouse/a | > | name default.a | > | numFiles 3 | > | numRows 2 | > | rawDataSize 6 | > | serialization.ddl struct a { i32 i, string j} | > | serialization.format 1 | > | serialization.lib > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | > | totalSize 16 | > | transient_lastDdlTime 1552903148 | > | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > | > | | > | input format: org.apache.hadoop.mapred.TextInputFormat | > | output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | > | properties: | > | COLUMN_STATS_ACCURATE > {"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} | > | bucket_count -1 | > | bucketing_version 2 | > | column.name.delimiter , | > | columns i,j | > | columns.comments | > | columns.types int:string | > | file.inputformat > org.apache.hadoop.mapred.TextInputFormat | > | file.outputformat > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | > | location hdfs://localhost:9000/tmp/hive/warehouse/a | > | name default.a | > | numFiles 3 | > | numRows 2 | > | rawDataSize 6 | > | serialization.ddl struct a { i32 i, string j} | > | serialization.format 1 | > | serialization.lib > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | > | totalSize 16 | > | transient_lastDdlTime 1552903148 | > | serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | > | name: default.a | > | name: default.a | > | Truncated Path -> Alias: | > +----------------------------------------------------+ > | Explain | > +----------------------------------------------------+ > | /a [a] | > | Reducer 2 | > | Needs Tagging: false | > | Reduce Operator Tree: | > | Group By Operator | > | aggregations: min(VALUE._col0) | > | keys: KEY._col0 (type: boolean) | > | mode: mergepartial | > | outputColumnNames: _col0, _col1 | > | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE > Column stats: COMPLETE | > | Select Operator | > | expressions: _col1 (type: boolean) | > | outputColumnNames: _col0 | > | Statistics: Num rows: 1 Data size: 4 Basic stats: > COMPLETE Column stats: COMPLETE | > | File Output Operator | > | compressed: false | > | GlobalTableId: 0 | > | directory: > hdfs://localhost:9000/tmp/hive/anagarwal/20f7b890-606b-4815-a56e-ab3384ef58f5/hive_2019-03-18_15-35-35_644_3057456177912469405-1/-mr-10001/.hive-staging_hive_2019-03-18_15-35-35_644_3057456177912469405-1/-ext-10002 > | > | NumFilesPerFileSink: 1 | > | Statistics: Num rows: 1 Data size: 4 Basic stats: > COMPLETE Column stats: COMPLETE | > | Stats Publishing Key Prefix: > hdfs://localhost:9000/tmp/hive/anagarwal/20f7b890-606b-4815-a56e-ab3384ef58f5/hive_2019-03-18_15-35-35_644_3057456177912469405-1/-mr-10001/.hive-staging_hive_2019-03-18_15-35-35_644_3057456177912469405-1/-ext-10002/ > | > | table: | > | input format: > org.apache.hadoop.mapred.SequenceFileInputFormat | > | output format: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | > | properties: | > | columns _col0 | > | columns.types boolean | > | escape.delim \ | > | > hive.serialization.extend.additional.nesting.levels true | > | serialization.escape.crlf true | > | serialization.format 1 | > | serialization.lib > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | > | serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | > | TotalFiles: 1 | > | GatherStats: false | > | MultiFileSpray: false | > | | > | Stage: Stage-0 | > | Fetch Operator | > | limit: -1 | > | Processor Tree: | > | ListSink | > | | > +----------------------------------------------------+ > {code} > query is rewritten with *true* as the column value. -- This message was sent by Atlassian JIRA (v7.6.3#76005)