Hi Ashutosh, Is Hive-10841 related? from the title of that jira, it sais “where col is not null”caused the issue; however above reproduce did not have that clause.
On Wed, Sep 2, 2015 at 2:24 AM, Ashutosh Chauhan <hashut...@apache.org> wrote: > https://issues.apache.org/jira/browse/HIVE-10841 > > Thanks, > Ashutosh > > On Tue, Sep 1, 2015 at 6:00 PM, Jim Green <openkbi...@gmail.com> wrote: > >> Seems Hive 1.2 fixed this issue. But not sure what is the JIRA related >> and the possibility to backport this fix into Hive 0.13? >> >> >> On Tue, Sep 1, 2015 at 5:35 PM, Jim Green <openkbi...@gmail.com> wrote: >> >>> Hi Team, >>> >>> Below is the minimum reproduce of wrong results in Hive 0.13: >>> >>> *1. Create 4 tables* >>> CREATE EXTERNAL TABLE testjoin1( joincol string ); >>> CREATE EXTERNAL TABLE testjoin2( >>> anothercol string , >>> joincol string); >>> >>> CREATE EXTERNAL TABLE testjoin3( anothercol string); >>> >>> CREATE EXTERNAL TABLE testjoin4( >>> joincol string, >>> wherecol string , >>> wherecol2 string); >>> >>> *2. Insert sample data * >>> (Note: Make sure you firstly create the dual table which only contains 1 >>> row) >>> >>> insert into table testjoin1 select '1' from dual; >>> insert into table testjoin2 select 'another','1' from dual; >>> insert into table testjoin3 select 'another' from dual; >>> insert into table testjoin4 select '1','I_AM_MISSING','201501' from dual; >>> insert into table testjoin4 select >>> '1','I_Shouldnot_be_in_output','201501' from >>> dual; >>> >>> hive> select * from testjoin1; >>> OK >>> 1 >>> Time taken: 0.04 seconds, Fetched: 1 row(s) >>> >>> hive> select * from testjoin2; >>> OK >>> another 1 >>> Time taken: 0.039 seconds, Fetched: 1 row(s) >>> >>> hive> select * from testjoin3; >>> OK >>> another >>> Time taken: 0.038 seconds, Fetched: 1 row(s) >>> >>> hive> select * from testjoin4; >>> OK >>> 1 I_AM_MISSING 201501 >>> 1 I_Shouldnot_be_in_output 201501 >>> Time taken: 0.04 seconds, Fetched: 2 row(s) >>> >>> *3. SQL1 is returning wrong results.* >>> >>> Select testjoin4.* From >>> testjoin1 >>> JOIN testjoin2 >>> ON (testjoin2.joincol = testjoin1.joincol) >>> JOIN testjoin3 >>> ON (testjoin3.anothercol= testjoin2.anothercol) >>> JOIN testjoin4 >>> ON (testjoin4.joincol = testjoin1.joincol AND >>> testjoin4.wherecol2='201501') >>> WHERE (testjoin4.wherecol='I_AM_MISSING'); >>> >>> 1 I_AM_MISSING 201501 >>> 1 I_Shouldnot_be_in_output 201501 >>> Time taken: 21.702 seconds, Fetched: 2 row(s) >>> >>> >>> *4. SQL2 is returning good result(If we move the both filters to WHERE >>> clause )* >>> >>> Select testjoin4.* From >>> testjoin1 >>> JOIN testjoin2 >>> ON (testjoin2.joincol = testjoin1.joincol) >>> JOIN testjoin3 >>> ON (testjoin3.anothercol= testjoin2.anothercol) >>> JOIN testjoin4 >>> ON (testjoin4.joincol = testjoin1.joincol) >>> WHERE (testjoin4.wherecol='I_AM_MISSING' and >>> testjoin4.wherecol2='201501'); >>> >>> 1 I_AM_MISSING 201501 >>> Time taken: 20.393 seconds, Fetched: 1 row(s) >>> ————— >>> *Another test is done in Hive 1.0 and found both SQL1 and SQL2 are >>> returning wrong results….* >>> >>> 1 I_AM_MISSING 201501 >>> 1 I_AM_MISSING 201501 >>> Time taken: 13.983 seconds, Fetched: 2 row(s) >>> >>> *Anybody knows any related JIRAs?* >>> >>> -- >>> Thanks, >>> www.openkb.info >>> (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool) >>> >> >> >> >> -- >> Thanks, >> www.openkb.info >> (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool) >> > > -- Thanks, www.openkb.info (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)