@Prasanth would you help me look into this problem? Thanks.
On Mon Jan 05 2015 at 上午12:03:42 wzc <[email protected]> wrote: > Recently we find a bug with orc ppd, here is the testcase: > > use test; > create table if not exists test_orc_src (a int, b int, c int) > stored as orc; > create table if not exists test_orc_src2 (a int, b int, d int) > stored as orc; > insert overwrite table test_orc_src select 1,2,3 from dim.city > limit 1; > insert overwrite table test_orc_src2 select 1,2,4 from dim.city > limit 1; > set hive.auto.convert.join = false; > select > tb.c > from test.test_orc_src tb > join test.test_orc_src2 tm > on tb.a = tm.awhere tb.b = 2 > > The correct answer for the above query is 3, while it returns empty.We > find that orc ppd use READ_COLUMN_NAMES_CONF_STR property to get the > required column list, it's not well constructed when there exists some > table whose storage path is prefix of some other table path. This bug is > relate to HIVE-1903 <https://issues.apache.org/jira/browse/HIVE-1903> , > IN HiveInputFormat#pushProjectionsAndFilters it use prefix match for to > get all alias associated with the given path, which I think is not very > suitable. I dont know why we shall do prefix match here instead of equal > match. > Any help is appreciated. > > > > > > > >
