Thanks Ashutosh.
Actually for this kind of query, if I put the 2 filters in WHERE clause
instead of ON clause, the query result is correct.

Do you suggest we put all filters into WHERE or OR clause? And Why?

On Wed, Sep 2, 2015 at 3:13 PM, Ashutosh Chauhan <hashut...@apache.org>
wrote:

> It indeed is. Title of bug is symptom of problem and
> doesn't accurately describe the problem. Bug will be triggered if following
> conditions are met:
>
> If query contains 3 or more joins
> AND
> joins are merged (i.e. tables participating in two of those joins are
> joined on same keys)
> AND
> these merged joins are not consecutive in query
> AND
> there is a filter on one of tables who participated in merged join which
> is in WHERE clause (not as join condition)
> then said filter will be dropped.
>
> Query you posted meets all these criteria. You can avoid this bug if you
> rewrite your query such that it violates one of the requirement (listed
> above) to trigger the bug.
>
> Ashutosh
>
>
> On Wed, Sep 2, 2015 at 10:19 AM, Jim Green <openkbi...@gmail.com> wrote:
>
>> Hi Ashutosh,
>>
>> Is Hive-10841 related? from the title of that jira, it sais “where col is
>> not null”caused the issue; however above reproduce did not have that clause.
>>
>>
>>
>> On Wed, Sep 2, 2015 at 2:24 AM, Ashutosh Chauhan <hashut...@apache.org>
>> wrote:
>>
>>> https://issues.apache.org/jira/browse/HIVE-10841
>>>
>>> Thanks,
>>> Ashutosh
>>>
>>> On Tue, Sep 1, 2015 at 6:00 PM, Jim Green <openkbi...@gmail.com> wrote:
>>>
>>>> Seems Hive 1.2 fixed this issue. But not sure what is the JIRA related
>>>> and the possibility to backport this fix into Hive 0.13?
>>>>
>>>>
>>>> On Tue, Sep 1, 2015 at 5:35 PM, Jim Green <openkbi...@gmail.com> wrote:
>>>>
>>>>> Hi Team,
>>>>>
>>>>> Below is the minimum reproduce of wrong results in Hive 0.13:
>>>>>
>>>>> *1. Create 4 tables*
>>>>> CREATE EXTERNAL TABLE testjoin1( joincol string );
>>>>> CREATE EXTERNAL TABLE testjoin2(
>>>>>    anothercol string ,
>>>>>    joincol string);
>>>>>
>>>>> CREATE EXTERNAL TABLE testjoin3( anothercol string);
>>>>>
>>>>> CREATE EXTERNAL TABLE testjoin4(
>>>>>   joincol string,
>>>>>   wherecol string ,
>>>>>   wherecol2 string);
>>>>>
>>>>> *2. Insert sample data *
>>>>> (Note: Make sure you firstly create the dual table which only contains
>>>>> 1 row)
>>>>>
>>>>> insert into table testjoin1 select '1' from dual;
>>>>> insert into table testjoin2 select 'another','1' from dual;
>>>>> insert into table testjoin3 select 'another' from dual;
>>>>> insert into table testjoin4 select '1','I_AM_MISSING','201501' from
>>>>> dual;
>>>>> insert into table testjoin4 select
>>>>> '1','I_Shouldnot_be_in_output','201501' from
>>>>> dual;
>>>>>
>>>>> hive> select * from testjoin1;
>>>>> OK
>>>>> 1
>>>>> Time taken: 0.04 seconds, Fetched: 1 row(s)
>>>>>
>>>>> hive> select * from testjoin2;
>>>>> OK
>>>>> another    1
>>>>> Time taken: 0.039 seconds, Fetched: 1 row(s)
>>>>>
>>>>> hive> select * from testjoin3;
>>>>> OK
>>>>> another
>>>>> Time taken: 0.038 seconds, Fetched: 1 row(s)
>>>>>
>>>>> hive> select * from testjoin4;
>>>>> OK
>>>>> 1    I_AM_MISSING    201501
>>>>> 1    I_Shouldnot_be_in_output    201501
>>>>> Time taken: 0.04 seconds, Fetched: 2 row(s)
>>>>>
>>>>> *3. SQL1 is returning wrong results.*
>>>>>
>>>>> Select testjoin4.* From
>>>>> testjoin1
>>>>> JOIN testjoin2
>>>>>   ON (testjoin2.joincol = testjoin1.joincol)
>>>>> JOIN testjoin3
>>>>>   ON (testjoin3.anothercol= testjoin2.anothercol)
>>>>> JOIN testjoin4
>>>>>   ON (testjoin4.joincol = testjoin1.joincol AND
>>>>> testjoin4.wherecol2='201501')
>>>>> WHERE (testjoin4.wherecol='I_AM_MISSING');
>>>>>
>>>>> 1    I_AM_MISSING    201501
>>>>> 1    I_Shouldnot_be_in_output    201501
>>>>> Time taken: 21.702 seconds, Fetched: 2 row(s)
>>>>>
>>>>>
>>>>> *4. SQL2 is returning good result(If we move the both filters to WHERE
>>>>> clause )*
>>>>>
>>>>> Select testjoin4.* From
>>>>> testjoin1
>>>>> JOIN testjoin2
>>>>>   ON (testjoin2.joincol = testjoin1.joincol)
>>>>> JOIN testjoin3
>>>>>   ON (testjoin3.anothercol= testjoin2.anothercol)
>>>>> JOIN testjoin4
>>>>>   ON (testjoin4.joincol = testjoin1.joincol)
>>>>> WHERE (testjoin4.wherecol='I_AM_MISSING' and
>>>>> testjoin4.wherecol2='201501');
>>>>>
>>>>> 1    I_AM_MISSING    201501
>>>>> Time taken: 20.393 seconds, Fetched: 1 row(s)
>>>>> —————
>>>>> *Another test is done in Hive 1.0 and found both SQL1 and SQL2 are
>>>>> returning wrong results….*
>>>>>
>>>>> 1 I_AM_MISSING 201501
>>>>> 1 I_AM_MISSING 201501
>>>>> Time taken: 13.983 seconds, Fetched: 2 row(s)
>>>>>
>>>>> *Anybody knows any related JIRAs?*
>>>>>
>>>>> --
>>>>> Thanks,
>>>>> www.openkb.info
>>>>> (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks,
>>>> www.openkb.info
>>>> (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)
>>>>
>>>
>>>
>>
>>
>> --
>> Thanks,
>> www.openkb.info
>> (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)
>>
>
>


-- 
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)

Reply via email to