[ 
https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15763302#comment-15763302
 ] 

Rui Li commented on HIVE-9153:
------------------------------

I guess no configuration is suitable for all cases :) If I remember, smaller 
"mapreduce.input.fileinputformat.split.maxsize" means more map tasks and is bad 
for performance when the data size is relatively big. So increasing it should 
help for most cases. Of course users should adjust it according to the cluster 
deployment, executor resources etc.
I'm not sure what you mean by performance test JIRAs. We have quite a few JIRAs 
to improve performance, and I think each such JIRA involves some simple 
performance test to verify the improvement. But I don't remember all of them.

> Perf enhancement on CombineHiveInputFormat and HiveInputFormat
> --------------------------------------------------------------
>
>                 Key: HIVE-9153
>                 URL: https://issues.apache.org/jira/browse/HIVE-9153
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Brock Noland
>            Assignee: Rui Li
>             Fix For: 1.1.0
>
>         Attachments: HIVE-9153.1-spark.patch, HIVE-9153.1-spark.patch, 
> HIVE-9153.2.patch, HIVE-9153.3.patch, screenshot.PNG
>
>
> The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. 
> However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in 
> Spark, it might make sense for us to use {{HiveInputFormat}} as well. We 
> should evaluate this on a query which has many input splits such as {{select 
> count(\*) from store_sales where something is not null}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to