[jira] [Commented] (HIVE-20056) SparkPartitionPruner shouldn't be triggered by Spark tasks

Sahil Takiar (JIRA) Thu, 19 Jul 2018 09:50:07 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-20056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549529#comment-16549529
 ]


Sahil Takiar commented on HIVE-20056:
-------------------------------------

[~lirui] could you take a look? It seems that we call {{SparkPartitionPruner}} 
whenever we call {{init}} in {{HiveInputFormat}}, but {{init}} is called in 
both {{getSplits}} and {{getRecordReader}}, which means we call 
{{SparkPartitionPruner}} for every file that we open inside a HoS task. Calling 
the pruner means reading the associated file on HDFS. This change ensures that 
the pruning is just done once.

> SparkPartitionPruner shouldn't be triggered by Spark tasks
> ----------------------------------------------------------
>
>                 Key: HIVE-20056
>                 URL: https://issues.apache.org/jira/browse/HIVE-20056
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>         Attachments: HIVE-20056.1.patch
>
>
> It looks like {{SparkDynamicPartitionPruner}} is being called by every Spark 
> task because it gets created whenever {{getRecordReader}} is called on the 
> associated {{InputFormat}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20056) SparkPartitionPruner shouldn't be triggered by Spark tasks

Reply via email to