[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander Pivovarov updated HIVE-10841: --------------------------------------- Summary: [WHERE col is not null] does not work sometimes for queries with many JOIN statements (was: [WHERE col is not null] does not work for large queries) > [WHERE col is not null] does not work sometimes for queries with many JOIN > statements > ------------------------------------------------------------------------------------- > > Key: HIVE-10841 > URL: https://issues.apache.org/jira/browse/HIVE-10841 > Project: Hive > Issue Type: Bug > Components: Query Processor > Reporter: Alexander Pivovarov > > The result from the following SELECT query is 3 rows but it should be 1 row. > I checked it in MySQL - it returned 1 row. > To reproduce the issue in Hive > 1. prepare tables > {code} > drop table if exists L; > drop table if exists LA; > drop table if exists FR; > drop table if exists A; > drop table if exists PI; > drop table if exists acct; > create table L as select 4436 id; > create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; > create table FR as select 4436 loan_id; > create table A as select 4748 id; > create table PI as select 4415 id; > create table acct as select 4748 aid, 10 acc_n, 122 brn; > insert into table acct values(4748, null, null); > insert into table acct values(4748, null, null); > {code} > 2. run SELECT query > {code} > select > acct.ACC_N, > acct.brn > FROM L > JOIN LA ON L.id = LA.loan_id > JOIN FR ON L.id = FR.loan_id > JOIN A ON LA.aid = A.id > JOIN PI ON PI.id = LA.pi_id > JOIN acct ON A.id = acct.aid > WHERE > L.id = 4436 > and acct.brn is not null; > {code} > the result is 3 rows > {code} > 10 122 > NULL NULL > NULL NULL > {code} > but it should be 1 row > {code} > 10 122 > {code} > 3. The workaround is to put "acct.brn is not null" to join condition > {code} > select > acct.ACC_N, > acct.brn > FROM L > JOIN LA ON L.id = LA.loan_id > JOIN FR ON L.id = FR.loan_id > JOIN A ON LA.aid = A.id > JOIN PI ON PI.id = LA.pi_id > JOIN acct ON A.id = acct.aid and acct.brn is not null > WHERE > L.id = 4436; > OK > 10 122 > Time taken: 23.479 seconds, Fetched: 1 row(s) > {code} > I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations > have the issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)