Recently we find a bug with orc ppd, here is the testcase: use test; create table if not exists test_orc_src (a int, b int, c int) stored as orc; create table if not exists test_orc_src2 (a int, b int, d int) stored as orc; insert overwrite table test_orc_src select 1,2,3 from dim.city limit 1; insert overwrite table test_orc_src2 select 1,2,4 from dim.city limit 1; set hive.auto.convert.join = false; select tb.c from test.test_orc_src tb join test.test_orc_src2 tm on tb.a = tm.awhere tb.b = 2
The correct answer for the above query is 3, while it returns empty.We find that orc ppd use READ_COLUMN_NAMES_CONF_STR property to get the required column list, it's not well constructed when there exists some table whose storage path is prefix of some other table path. This bug is relate to HIVE-1903 <https://issues.apache.org/jira/browse/HIVE-1903> , IN HiveInputFormat#pushProjectionsAndFilters it use prefix match for to get all alias associated with the given path, which I think is not very suitable. I dont know why we shall do prefix match here instead of equal match. Any help is appreciated.
