Recently we find a bug with orc ppd,  here is the testcase:

use test;
create table if not exists test_orc_src (a int, b int, c int)
stored as orc;
create table if not exists test_orc_src2 (a int, b int, d int)
stored as orc;
insert overwrite table test_orc_src select 1,2,3 from dim.city
limit 1;
insert overwrite table test_orc_src2 select 1,2,4 from dim.city
limit 1;
set hive.auto.convert.join = false;
select
  tb.c
from test.test_orc_src tb
join test.test_orc_src2 tm
on tb.a = tm.awhere tb.b = 2

The correct answer for the above query is 3, while it returns empty.We find
that orc ppd use READ_COLUMN_NAMES_CONF_STR property to get the required
column list, it's not well constructed when there exists some table
whose storage
path is prefix of some other table path. This bug is relate to HIVE-1903
<https://issues.apache.org/jira/browse/HIVE-1903> , IN
HiveInputFormat#pushProjectionsAndFilters
it use prefix match for to get all alias associated with the given path,
which I think is not very suitable.  I dont know why we shall do prefix
match here instead of equal match.
Any help is appreciated.

Reply via email to