[ 
https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864477#comment-15864477
 ] 

Illya Yalovyy commented on HIVE-15881:
--------------------------------------

I think for Utilities#getInputSummary the property name should be 
"hive.exec.input.summary.max.threads", to be consistent with other properties 
in HiveConf. This value is not used as is to create a thread pool, it is only 
an upper limit for the thread pool size. If number of input paths is less than 
hive.exec.input.summary.max.threads it will be used instead. It means the 
actual number of threads will be <= hive.exec.input.summary.max.threads.

> Use new thread count variable name instead of mapred.dfsclient.parallelism.max
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-15881
>                 URL: https://issues.apache.org/jira/browse/HIVE-15881
>             Project: Hive
>          Issue Type: Task
>          Components: Query Planning
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>            Priority: Minor
>
> The Utilities class has two methods, {{getInputSummary}} and 
> {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} 
> to get the summary of a list of input locations in parallel. These methods 
> are Hive related, but the variable name does not look it is specific for Hive.
> Also, the above variable is not on HiveConf nor used anywhere else. I just 
> found a reference on the Hadoop MR1 code.
> I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, 
> and use a different variable name, such as 
> {{hive.get.input.listing.num.threads}}, that reflects the intention of the 
> variable. The removal of the old variable might happen on Hive 3.x



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to