[ https://issues.apache.org/jira/browse/HIVE-21327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951730#comment-16951730 ]
Marta Kuczora commented on HIVE-21327: -------------------------------------- Thanks a log [~szita] for the review. Pushed to master, but accidentally with a wrong commit message. The commit for this fix is [https://github.com/apache/hive/commit/a0ccbff838afb440461a4d6df335f824c1dccbcc] Added the commit to errata.txt in the scope of https://issues.apache.org/jira/browse/HIVE-22345 > Predicate is not pushed to Parquet if > hive.parquet.timestamp.skip.conversion=true > --------------------------------------------------------------------------------- > > Key: HIVE-21327 > URL: https://issues.apache.org/jira/browse/HIVE-21327 > Project: Hive > Issue Type: Bug > Affects Versions: 4.0.0 > Reporter: Marta Kuczora > Assignee: Marta Kuczora > Priority: Major > Attachments: HIVE-21327.1.patch, HIVE-21327.2.patch > > > The Parquet FilterPredicate is created and set to the configuration in the > ParquetRecordReaderBase.setFilter method. This method is used from the > ParquetRecordReaderWrapper constructor through the > ParquetRecordReaderBase.getSplit method and expects a JobConf as parameter > where it sets the created filter predicate. In the ParquetRecordReaderWrapper > constructor, multiple JobConf object is used: > {noformat} > jobConf = oldJobConf; > final ParquetInputSplit split = getSplit(oldSplit, jobConf); > TaskAttemptID taskAttemptID = > TaskAttemptID.forName(jobConf.get(IOConstants.MAPRED_TASK_ID)); > if (taskAttemptID == null) { > taskAttemptID = new TaskAttemptID(); > } > // create a TaskInputOutputContext > Configuration conf = jobConf; > if (skipTimestampConversion ^ HiveConf.getBoolVar( > conf, HiveConf.ConfVars.HIVE_PARQUET_TIMESTAMP_SKIP_CONVERSION)) { > conf = new JobConf(oldJobConf); > HiveConf.setBoolVar(conf, > HiveConf.ConfVars.HIVE_PARQUET_TIMESTAMP_SKIP_CONVERSION, > skipTimestampConversion); > } > final TaskAttemptContext taskContext = > ContextUtil.newTaskAttemptContext(conf, taskAttemptID); > {noformat} > So we have the jobConf, oldJobConf and conf objects and the getSplit is > called with the jobConf object, so the filter predicate will be set into this > config object. Based on this code part, the jobConf and oldJobConf should be > the same reference inside the if statement, so the newly created conf should > also contain the filter predicate. However in the getSplit method the value > of the jobConf is changed by the projectionPusher.pushProjectionsAndFilters > method, so inside the if statement, the jobConf and the oldJobConf are > actually different references. The filter predicate is set in the jobConf, > but if the if condition is true, the conf will be created from the oldJobConf > so it won't contain the filter predicate. > Just for reference, this behavior was introduced in > [HIVE-9873|https://issues.apache.org/jira/browse/HIVE-9873]. > Since the goal of the if statement is only to update the > HIVE_PARQUET_TIMESTAMP_SKIP_CONVERSION property in the configuration, it > should be using the jobConf where the filter predicate is correctly set. -- This message was sent by Atlassian Jira (v8.3.4#803005)