[ https://issues.apache.org/jira/browse/HIVE-15758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088071#comment-16088071 ]
Vineet Garg commented on HIVE-15758: ------------------------------------ [~pxiong] For the query mentioned above in this JIRA hive generates following plan {noformat} Stage-0 Fetch Operator limit:-1 Stage-1 Reducer 2 File Output Operator [FS_24] Select Operator [SEL_23] (rows=185 width=244) Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] Filter Operator [FIL_22] (rows=185 width=244) predicate:(UDFToLong(_col5) <> CASE WHEN (_col10 is null) THEN (0) ELSE (_col9) END) Merge Join Operator [MERGEJOIN_29] (rows=185 width=244) Conds:RS_19._col4=RS_20._col2(Left Outer),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10"] <-Map 1 [SIMPLE_EDGE] SHUFFLE [RS_19] PartitionCols:_col4 Select Operator [SEL_1] (rows=26 width=121) Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] TableScan [TS_0] (rows=26 width=121) default@part,part,Tbl:COMPLETE,Col:NONE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"] <-Reducer 5 [SIMPLE_EDGE] SHUFFLE [RS_20] PartitionCols:_col2 Select Operator [SEL_18] (rows=169 width=243) Output:["_col0","_col1","_col2"] Group By Operator [GBY_17] (rows=169 width=243) Output:["_col0","_col1"],aggregations:["count(VALUE._col0)"],keys:KEY._col0 <-Reducer 4 [SIMPLE_EDGE] SHUFFLE [RS_16] PartitionCols:_col0 Group By Operator [GBY_15] (rows=338 width=243) Output:["_col0","_col1"],aggregations:["count(_col1)"],keys:_col2 Select Operator [SEL_14] (rows=338 width=243) Output:["_col1","_col2"] Filter Operator [FIL_13] (rows=338 width=243) predicate:(_col2 <> _col0) Merge Join Operator [MERGEJOIN_28] (rows=338 width=243) Conds:(Inner),Output:["_col0","_col1","_col2"] <-Map 3 [CUSTOM_SIMPLE_EDGE] PARTITION_ONLY_SHUFFLE [RS_10] Select Operator [SEL_3] (rows=26 width=121) Output:["_col0","_col1"] TableScan [TS_2] (rows=26 width=121) default@part,pp,Tbl:COMPLETE,Col:NONE,Output:["p_type","p_size"] <-Reducer 7 [CUSTOM_SIMPLE_EDGE] PARTITION_ONLY_SHUFFLE [RS_11] Group By Operator [GBY_8] (rows=13 width=121) Output:["_col0"],keys:KEY._col0 <-Map 6 [SIMPLE_EDGE] SHUFFLE [RS_7] PartitionCols:_col0 Group By Operator [GBY_6] (rows=26 width=121) Output:["_col0"],keys:p_type TableScan [TS_4] (rows=26 width=121) default@part,part,Tbl:COMPLETE,Col:NONE,Output:["p_type"] {noformat} First outer table p is joined with inner table pp to come up with all possible values where correlated condition {{pp.p_type <> p.p_type}} is true. Then group by is done on this to do count(). This is then further joined (Left outer) with outer table with a filter on top taking care of case when right side is null (count is suppose to return 0 instead of null for empty rows) along with whatever subquery condition there is. Briefly we re-write such queries into left outer join with a filter on top. Hope this makes sense. > Allow correlated scalar subqueries with aggregates which has non-equi join > predicates > ------------------------------------------------------------------------------------- > > Key: HIVE-15758 > URL: https://issues.apache.org/jira/browse/HIVE-15758 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer > Reporter: Vineet Garg > Assignee: Vineet Garg > Labels: sub-query > Attachments: HIVE-15758.1.patch, HIVE-15758.2.patch > > > Queries such as > {code} select * from part where p_size <> (select count(p_size) from part pp > where part.p_type <> pp.p_type); {code} are currently not allowed since HIVE > doesn't know how to rewrite such queries to preserve the correctness for > cases when there is zero row -- This message was sent by Atlassian JIRA (v6.4.14#64029)