Hi Ashutosh,

Is Hive-10841 related? from the title of that jira, it sais “where col is
not null”caused the issue; however above reproduce did not have that clause.



On Wed, Sep 2, 2015 at 2:24 AM, Ashutosh Chauhan <hashut...@apache.org>
wrote:

> https://issues.apache.org/jira/browse/HIVE-10841
>
> Thanks,
> Ashutosh
>
> On Tue, Sep 1, 2015 at 6:00 PM, Jim Green <openkbi...@gmail.com> wrote:
>
>> Seems Hive 1.2 fixed this issue. But not sure what is the JIRA related
>> and the possibility to backport this fix into Hive 0.13?
>>
>>
>> On Tue, Sep 1, 2015 at 5:35 PM, Jim Green <openkbi...@gmail.com> wrote:
>>
>>> Hi Team,
>>>
>>> Below is the minimum reproduce of wrong results in Hive 0.13:
>>>
>>> *1. Create 4 tables*
>>> CREATE EXTERNAL TABLE testjoin1( joincol string );
>>> CREATE EXTERNAL TABLE testjoin2(
>>>    anothercol string ,
>>>    joincol string);
>>>
>>> CREATE EXTERNAL TABLE testjoin3( anothercol string);
>>>
>>> CREATE EXTERNAL TABLE testjoin4(
>>>   joincol string,
>>>   wherecol string ,
>>>   wherecol2 string);
>>>
>>> *2. Insert sample data *
>>> (Note: Make sure you firstly create the dual table which only contains 1
>>> row)
>>>
>>> insert into table testjoin1 select '1' from dual;
>>> insert into table testjoin2 select 'another','1' from dual;
>>> insert into table testjoin3 select 'another' from dual;
>>> insert into table testjoin4 select '1','I_AM_MISSING','201501' from dual;
>>> insert into table testjoin4 select
>>> '1','I_Shouldnot_be_in_output','201501' from
>>> dual;
>>>
>>> hive> select * from testjoin1;
>>> OK
>>> 1
>>> Time taken: 0.04 seconds, Fetched: 1 row(s)
>>>
>>> hive> select * from testjoin2;
>>> OK
>>> another    1
>>> Time taken: 0.039 seconds, Fetched: 1 row(s)
>>>
>>> hive> select * from testjoin3;
>>> OK
>>> another
>>> Time taken: 0.038 seconds, Fetched: 1 row(s)
>>>
>>> hive> select * from testjoin4;
>>> OK
>>> 1    I_AM_MISSING    201501
>>> 1    I_Shouldnot_be_in_output    201501
>>> Time taken: 0.04 seconds, Fetched: 2 row(s)
>>>
>>> *3. SQL1 is returning wrong results.*
>>>
>>> Select testjoin4.* From
>>> testjoin1
>>> JOIN testjoin2
>>>   ON (testjoin2.joincol = testjoin1.joincol)
>>> JOIN testjoin3
>>>   ON (testjoin3.anothercol= testjoin2.anothercol)
>>> JOIN testjoin4
>>>   ON (testjoin4.joincol = testjoin1.joincol AND
>>> testjoin4.wherecol2='201501')
>>> WHERE (testjoin4.wherecol='I_AM_MISSING');
>>>
>>> 1    I_AM_MISSING    201501
>>> 1    I_Shouldnot_be_in_output    201501
>>> Time taken: 21.702 seconds, Fetched: 2 row(s)
>>>
>>>
>>> *4. SQL2 is returning good result(If we move the both filters to WHERE
>>> clause )*
>>>
>>> Select testjoin4.* From
>>> testjoin1
>>> JOIN testjoin2
>>>   ON (testjoin2.joincol = testjoin1.joincol)
>>> JOIN testjoin3
>>>   ON (testjoin3.anothercol= testjoin2.anothercol)
>>> JOIN testjoin4
>>>   ON (testjoin4.joincol = testjoin1.joincol)
>>> WHERE (testjoin4.wherecol='I_AM_MISSING' and
>>> testjoin4.wherecol2='201501');
>>>
>>> 1    I_AM_MISSING    201501
>>> Time taken: 20.393 seconds, Fetched: 1 row(s)
>>> —————
>>> *Another test is done in Hive 1.0 and found both SQL1 and SQL2 are
>>> returning wrong results….*
>>>
>>> 1 I_AM_MISSING 201501
>>> 1 I_AM_MISSING 201501
>>> Time taken: 13.983 seconds, Fetched: 2 row(s)
>>>
>>> *Anybody knows any related JIRAs?*
>>>
>>> --
>>> Thanks,
>>> www.openkb.info
>>> (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)
>>>
>>
>>
>>
>> --
>> Thanks,
>> www.openkb.info
>> (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)
>>
>
>


-- 
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)

Reply via email to