[ https://issues.apache.org/jira/browse/HIVE-20056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549529#comment-16549529 ]
Sahil Takiar commented on HIVE-20056: ------------------------------------- [~lirui] could you take a look? It seems that we call {{SparkPartitionPruner}} whenever we call {{init}} in {{HiveInputFormat}}, but {{init}} is called in both {{getSplits}} and {{getRecordReader}}, which means we call {{SparkPartitionPruner}} for every file that we open inside a HoS task. Calling the pruner means reading the associated file on HDFS. This change ensures that the pruning is just done once. > SparkPartitionPruner shouldn't be triggered by Spark tasks > ---------------------------------------------------------- > > Key: HIVE-20056 > URL: https://issues.apache.org/jira/browse/HIVE-20056 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Sahil Takiar > Assignee: Sahil Takiar > Priority: Major > Attachments: HIVE-20056.1.patch > > > It looks like {{SparkDynamicPartitionPruner}} is being called by every Spark > task because it gets created whenever {{getRecordReader}} is called on the > associated {{InputFormat}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)