[ https://issues.apache.org/jira/browse/HIVE-21452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zengxl updated HIVE-21452: -------------------------- Affects Version/s: (was: 2.3.4) (was: 3.1.0) > Loss of query condition when exist exists > ----------------------------------------- > > Key: HIVE-21452 > URL: https://issues.apache.org/jira/browse/HIVE-21452 > Project: Hive > Issue Type: Bug > Affects Versions: 1.1.0 > Reporter: zengxl > Priority: Major > > In our production environment, there are four tables to do association > queries. There are exists in the conditions, and we found that the first two > conditions were lost. > The following two conditions are missing: > {color:#f79232}t2.cust_no is null and t4.cust_level not in ('4','5'){color} > > In the test environment, I prepared the data of the following four tables, > and a condition was lost in the simulation. > tables: > test_table1 > cust_no,name > 60001,lisa > 60002,tina > 60003,kylin > 60004,jeny > 60005,john > 60006,jamse > test_table2 > cust_no,acct_type > 60001,1 > 60001,1 > 60001,2 > 60002,1 > 60003,2 > 60003,3 > test_table3 > cust_no > 60001 > 60002 > 60003 > 60004 > 60005 > 60007 > test_table4 > cust_no,cust_level > 60001,1 > 60002,2 > 60003,3 > 60004,4 > 60005,5 > > create table tmp.test_table1(cust_no string,name string); > create table tmp.test_table2(cust_no string,acct_type string); > create table tmp.test_table3(cust_no string); > create table tmp.test_table4(cust_no string,cust_level string); > insert into tmp.test_table1 select '60001','lisa'; > insert into tmp.test_table1 select '60002','tina'; > insert into tmp.test_table1 select '60003','kylin'; > insert into tmp.test_table1 select '60004','jeny'; > insert into tmp.test_table1 select '60005','john'; > insert into tmp.test_table1 select '60006','jamse'; > insert into tmp.test_table2 select '60001','1'; > insert into tmp.test_table2 select '60001','1'; > insert into tmp.test_table2 select '60001','2'; > insert into tmp.test_table2 select '60002','1'; > insert into tmp.test_table2 select '60003','2'; > insert into tmp.test_table2 select '60002','3'; > insert into tmp.test_table3 select '60001'; > insert into tmp.test_table3 select '60002'; > insert into tmp.test_table3 select '60003'; > insert into tmp.test_table3 select '60004'; > insert into tmp.test_table3 select '60005'; > insert into tmp.test_table3 select '60007'; > insert into tmp.test_table4 select '60001','1'; > insert into tmp.test_table4 select '60002','2'; > insert into tmp.test_table4 select '60003','3'; > insert into tmp.test_table4 select '60004','4'; > insert into tmp.test_table4 select '60005','5'; > > Here is my query SQL And shut down mapjoin: > set hive.auto.convert.join=false; > select t1.cust_no as cust_no,t2.cust_no as custNO,t1.name from > tmp.test_table1 t1 > left join tmp.test_table2 t2 on t1.cust_no=t2.cust_no > and t2.acct_type='1' > left join tmp.test_table4 t4 on t1.cust_no=t4.cust_no > where t2.cust_no is null and t4.cust_level not in ('4','5') and exists > (select 1 from tmp.test_table3 t3 where t1.cust_no=t3.cust_no) > > All I want is to include cust_no for 6003,But the result is inclusive 6004 > and 6005,this wrong 。{color:#f79232}In my production environment, 6001 came > out。Loss of condition because cust_no is null。{color} > {color:#f6c342}View the execution plan, t4.cust_level not in ('4','5') > condition missing{color} > *{color:#f6c342}Explain:{color}* > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-2 depends on stages: Stage-1 > Stage-0 depends on stages: Stage-2 > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Map Operator Tree: > TableScan > alias: t1 > Statistics: Num rows: 12 Data size: 2128 Basic stats: COMPLETE Column stats: > NONE > Filter Operator > predicate: cust_no is not null (type: boolean) > Statistics: Num rows: 6 Data size: 1064 Basic stats: COMPLETE Column stats: > NONE > Reduce Output Operator > key expressions: cust_no (type: string) > sort order: + > Map-reduce partition columns: cust_no (type: string) > Statistics: Num rows: 6 Data size: 1064 Basic stats: COMPLETE Column stats: > NONE > value expressions: name (type: string) > TableScan > alias: t2 > Statistics: Num rows: 12 Data size: 2088 Basic stats: COMPLETE Column stats: > NONE > Filter Operator > predicate: ((acct_type = '1') and cust_no is not null) (type: boolean) > Statistics: Num rows: 3 Data size: 522 Basic stats: COMPLETE Column stats: > NONE > Reduce Output Operator > key expressions: cust_no (type: string) > sort order: + > Map-reduce partition columns: cust_no (type: string) > Statistics: Num rows: 3 Data size: 522 Basic stats: COMPLETE Column stats: > NONE > TableScan > alias: t4 > Statistics: Num rows: 10 Data size: 1740 Basic stats: COMPLETE Column stats: > NONE > Filter Operator > predicate: cust_no is not null (type: boolean) > Statistics: Num rows: 5 Data size: 870 Basic stats: COMPLETE Column stats: > NONE > Reduce Output Operator > key expressions: cust_no (type: string) > sort order: + > Map-reduce partition columns: cust_no (type: string) > Statistics: Num rows: 5 Data size: 870 Basic stats: COMPLETE Column stats: > NONE > Reduce Operator Tree: > Join Operator > condition map: > Left Outer Join0 to 1 > Left Outer Join0 to 2 > keys: > 0 cust_no (type: string) > 1 cust_no (type: string) > 2 cust_no (type: string) > outputColumnNames: _col0, _col1, _col5 > Statistics: Num rows: 13 Data size: 2340 Basic stats: COMPLETE Column stats: > NONE > Filter Operator > predicate: _col5 is null (type: boolean) > Statistics: Num rows: 6 Data size: 1080 Basic stats: COMPLETE Column stats: > NONE > File Output Operator > compressed: false > table: > input format: org.apache.hadoop.mapred.SequenceFileInputFormat > output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe > Stage: Stage-2 > Map Reduce > Map Operator Tree: > TableScan > Reduce Output Operator > key expressions: _col0 (type: string) > sort order: + > Map-reduce partition columns: _col0 (type: string) > Statistics: Num rows: 6 Data size: 1080 Basic stats: COMPLETE Column stats: > NONE > value expressions: _col1 (type: string) > TableScan > alias: t3 > Statistics: Num rows: 12 Data size: 1068 Basic stats: COMPLETE Column stats: > NONE > Filter Operator > predicate: cust_no is not null (type: boolean) > Statistics: Num rows: 6 Data size: 534 Basic stats: COMPLETE Column stats: > NONE > Select Operator > expressions: cust_no (type: string) > outputColumnNames: _col1 > Statistics: Num rows: 6 Data size: 534 Basic stats: COMPLETE Column stats: > NONE > Group By Operator > keys: _col1 (type: string) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 6 Data size: 534 Basic stats: COMPLETE Column stats: > NONE > Reduce Output Operator > key expressions: _col0 (type: string) > sort order: + > Map-reduce partition columns: _col0 (type: string) > Statistics: Num rows: 6 Data size: 534 Basic stats: COMPLETE Column stats: > NONE > Reduce Operator Tree: > Join Operator > condition map: > Left Semi Join 0 to 1 > keys: > 0 _col0 (type: string) > 1 _col0 (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 6 Data size: 1188 Basic stats: COMPLETE Column stats: > NONE > Select Operator > expressions: _col0 (type: string), null (type: string), _col1 (type: string) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 6 Data size: 1188 Basic stats: COMPLETE Column stats: > NONE > File Output Operator > compressed: false > Statistics: Num rows: 6 Data size: 1188 Basic stats: COMPLETE Column stats: > NONE > table: > input format: org.apache.hadoop.mapred.SequenceFileInputFormat > output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > > {color:#f6c342}*result:*{color} > 60003 NULL kylin > 60004 NULL jeny > 60005 NULL john > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)