Seems Hive 1.2 fixed this issue. But not sure what is the JIRA related and the possibility to backport this fix into Hive 0.13?
On Tue, Sep 1, 2015 at 5:35 PM, Jim Green <openkbi...@gmail.com> wrote: > Hi Team, > > Below is the minimum reproduce of wrong results in Hive 0.13: > > *1. Create 4 tables* > CREATE EXTERNAL TABLE testjoin1( joincol string ); > CREATE EXTERNAL TABLE testjoin2( > anothercol string , > joincol string); > > CREATE EXTERNAL TABLE testjoin3( anothercol string); > > CREATE EXTERNAL TABLE testjoin4( > joincol string, > wherecol string , > wherecol2 string); > > *2. Insert sample data * > (Note: Make sure you firstly create the dual table which only contains 1 > row) > > insert into table testjoin1 select '1' from dual; > insert into table testjoin2 select 'another','1' from dual; > insert into table testjoin3 select 'another' from dual; > insert into table testjoin4 select '1','I_AM_MISSING','201501' from dual; > insert into table testjoin4 select '1','I_Shouldnot_be_in_output','201501' > from > dual; > > hive> select * from testjoin1; > OK > 1 > Time taken: 0.04 seconds, Fetched: 1 row(s) > > hive> select * from testjoin2; > OK > another 1 > Time taken: 0.039 seconds, Fetched: 1 row(s) > > hive> select * from testjoin3; > OK > another > Time taken: 0.038 seconds, Fetched: 1 row(s) > > hive> select * from testjoin4; > OK > 1 I_AM_MISSING 201501 > 1 I_Shouldnot_be_in_output 201501 > Time taken: 0.04 seconds, Fetched: 2 row(s) > > *3. SQL1 is returning wrong results.* > > Select testjoin4.* From > testjoin1 > JOIN testjoin2 > ON (testjoin2.joincol = testjoin1.joincol) > JOIN testjoin3 > ON (testjoin3.anothercol= testjoin2.anothercol) > JOIN testjoin4 > ON (testjoin4.joincol = testjoin1.joincol AND > testjoin4.wherecol2='201501') > WHERE (testjoin4.wherecol='I_AM_MISSING'); > > 1 I_AM_MISSING 201501 > 1 I_Shouldnot_be_in_output 201501 > Time taken: 21.702 seconds, Fetched: 2 row(s) > > > *4. SQL2 is returning good result(If we move the both filters to WHERE > clause )* > > Select testjoin4.* From > testjoin1 > JOIN testjoin2 > ON (testjoin2.joincol = testjoin1.joincol) > JOIN testjoin3 > ON (testjoin3.anothercol= testjoin2.anothercol) > JOIN testjoin4 > ON (testjoin4.joincol = testjoin1.joincol) > WHERE (testjoin4.wherecol='I_AM_MISSING' and testjoin4.wherecol2='201501'); > > 1 I_AM_MISSING 201501 > Time taken: 20.393 seconds, Fetched: 1 row(s) > ————— > *Another test is done in Hive 1.0 and found both SQL1 and SQL2 are > returning wrong results….* > > 1 I_AM_MISSING 201501 > 1 I_AM_MISSING 201501 > Time taken: 13.983 seconds, Fetched: 2 row(s) > > *Anybody knows any related JIRAs?* > > -- > Thanks, > www.openkb.info > (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool) > -- Thanks, www.openkb.info (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)