[ https://issues.apache.org/jira/browse/HIVE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122708#comment-16122708 ]
Junjie Chen commented on HIVE-17261: ------------------------------------ Actually, Hive use two deprecated parquet APIs, one is ParquetInputSplit, another is filterRowGroup. This is because parquet introduce new dictionary filter. The key point here is how to leverage both statistics filter and dictionary filter, in existing code, hive explicitly apply statistic filter in Hive side. To apply both statistics and dictionary filter, we can either explicitly changed filterRowGroup API or pass predicate statement through job configuration to parquet and filter at parquet side. The patch I provide is to pass predicate statement and skip explicitly filter at hive side. > Hive use deprecated ParquetInputSplit constructor which blocked parquet > dictionary filter > ----------------------------------------------------------------------------------------- > > Key: HIVE-17261 > URL: https://issues.apache.org/jira/browse/HIVE-17261 > Project: Hive > Issue Type: Improvement > Components: Database/Schema > Affects Versions: 2.2.0 > Reporter: Junjie Chen > Assignee: Junjie Chen > Priority: Minor > Attachments: HIVE-17261.2.patch, HIVE-17261.diff, HIVE-17261.patch > > > Hive use deprecated ParquetInputSplit in > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L128] > Please see interface definition in > [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputSplit.java#L80] > Old interface set rowgroupoffset values which will lead to skip dictionary > filter in parquet. -- This message was sent by Atlassian JIRA (v6.4.14#64029)