Seems Hive 1.2 fixed this issue. But not sure what is the JIRA related and
the possibility to backport this fix into Hive 0.13?


On Tue, Sep 1, 2015 at 5:35 PM, Jim Green <openkbi...@gmail.com> wrote:

> Hi Team,
>
> Below is the minimum reproduce of wrong results in Hive 0.13:
>
> *1. Create 4 tables*
> CREATE EXTERNAL TABLE testjoin1( joincol string );
> CREATE EXTERNAL TABLE testjoin2(
>    anothercol string ,
>    joincol string);
>
> CREATE EXTERNAL TABLE testjoin3( anothercol string);
>
> CREATE EXTERNAL TABLE testjoin4(
>   joincol string,
>   wherecol string ,
>   wherecol2 string);
>
> *2. Insert sample data *
> (Note: Make sure you firstly create the dual table which only contains 1
> row)
>
> insert into table testjoin1 select '1' from dual;
> insert into table testjoin2 select 'another','1' from dual;
> insert into table testjoin3 select 'another' from dual;
> insert into table testjoin4 select '1','I_AM_MISSING','201501' from dual;
> insert into table testjoin4 select '1','I_Shouldnot_be_in_output','201501'
> from
> dual;
>
> hive> select * from testjoin1;
> OK
> 1
> Time taken: 0.04 seconds, Fetched: 1 row(s)
>
> hive> select * from testjoin2;
> OK
> another    1
> Time taken: 0.039 seconds, Fetched: 1 row(s)
>
> hive> select * from testjoin3;
> OK
> another
> Time taken: 0.038 seconds, Fetched: 1 row(s)
>
> hive> select * from testjoin4;
> OK
> 1    I_AM_MISSING    201501
> 1    I_Shouldnot_be_in_output    201501
> Time taken: 0.04 seconds, Fetched: 2 row(s)
>
> *3. SQL1 is returning wrong results.*
>
> Select testjoin4.* From
> testjoin1
> JOIN testjoin2
>   ON (testjoin2.joincol = testjoin1.joincol)
> JOIN testjoin3
>   ON (testjoin3.anothercol= testjoin2.anothercol)
> JOIN testjoin4
>   ON (testjoin4.joincol = testjoin1.joincol AND
> testjoin4.wherecol2='201501')
> WHERE (testjoin4.wherecol='I_AM_MISSING');
>
> 1    I_AM_MISSING    201501
> 1    I_Shouldnot_be_in_output    201501
> Time taken: 21.702 seconds, Fetched: 2 row(s)
>
>
> *4. SQL2 is returning good result(If we move the both filters to WHERE
> clause )*
>
> Select testjoin4.* From
> testjoin1
> JOIN testjoin2
>   ON (testjoin2.joincol = testjoin1.joincol)
> JOIN testjoin3
>   ON (testjoin3.anothercol= testjoin2.anothercol)
> JOIN testjoin4
>   ON (testjoin4.joincol = testjoin1.joincol)
> WHERE (testjoin4.wherecol='I_AM_MISSING' and testjoin4.wherecol2='201501');
>
> 1    I_AM_MISSING    201501
> Time taken: 20.393 seconds, Fetched: 1 row(s)
> —————
> *Another test is done in Hive 1.0 and found both SQL1 and SQL2 are
> returning wrong results….*
>
> 1 I_AM_MISSING 201501
> 1 I_AM_MISSING 201501
> Time taken: 13.983 seconds, Fetched: 2 row(s)
>
> *Anybody knows any related JIRAs?*
>
> --
> Thanks,
> www.openkb.info
> (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)
>



-- 
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)

Reply via email to