Not solution to the problem on HDP 2.6.5, but I have tested the first script in Hive 2.3.4 and Hive 3.1.1. On Hive 2.3.4, it returns 1 row, and on Hive 3.1.1, it returns no row. So, I guess the bug is still in HDP 2.6.5.
--- Sungwoo On Tue, Apr 23, 2019 at 7:40 PM Rajat Khandelwal <pro...@apache.org> wrote: > Hi > > I've recently noticed incorrect behaviour from Hive Query Planner. The > simplest example I could construct is as follows > > SELECT > tbl3.col2 AS current_regularity_streak > FROM (select 1 col1) tbl1 > LEFT JOIN > (select 1 col1) tbl2 ON tbl1.col1 = tbl2.col1 > LEFT JOIN (select 1 col1, 1 col2) tbl3 ON tbl1.col1 = tbl3.col1 > WHERE > tbl1.col1 in (select 1 col1 union all select 2) > AND > tbl3.col2 >= 2 > > > The query should logically return no rows, but it does! It returns 1 row > with 1 column, with value = 1. The value=1 should have been filtered out by > the filter tbl3.col2 >= 2 > So df > > On further examination, I believe the culprit is the IN clause. If I > remove this, the query works correctly and returns 0 rows. > > SELECT > tbl3.col2 AS current_regularity_streak > FROM (select 1 col1) tbl1 > LEFT JOIN > (select 1 col1) tbl2 ON tbl1.col1 = tbl2.col1 > LEFT JOIN (select 1 col1, 1 col2) tbl3 ON tbl1.col1 = tbl3.col1 > WHERE > tbl3.col2 >= 2 > > Is this a known issue? I couldn't find anything on JIRA/Stack > overflow/Google. > > I further analyzed using EXPLAIN FORMATTED and noticed that the plan of > the first query doesn't contain the >=2 predicate. The plan of the second > query does. I wonder how the planner could omit the filter clause > altogether? > > I'm using HDP 2.6.5.10-2. >