[ 
https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864325#comment-15864325
 ] 

Thomas Poepping commented on HIVE-15881:
----------------------------------------

Spent a little more time looking into this, found the same thing as Sahil. I 
would like to suggest some edits:

I would change the naming of the variable. As I said above, 
{{hive.get.input.listing.num.threads}} isn't exactly clear, and because this is 
used for two things ({{getInputSummary}} and {{getInputPaths}}) I would like to 
propose a different way of doing this.

* Two configuration values, one for each usage:
** {{hive.exec.input.paths.num.threads}} for {{getInputPaths}}
** {{hive.exec.input.summary.num.threads}} for {{getInputSummary}}
* Bonus is that these follow existing {{HiveConf}} patterns

Other questions that should be asked:
* what is the default value? 
** I'm looking through {{HiveConf}}, but not seeing any other config value that 
might suffice. Maybe we just use 1?

Also, I'm a fan of '0' meaning 'maximum number allowable' with these types of 
config values, but that can be up for discussion. We would need a way to figure 
out what "max allowable" actually _means_, either through another configuration 
value or based on the OS structure? I don't know.

> Use new thread count variable name instead of mapred.dfsclient.parallelism.max
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-15881
>                 URL: https://issues.apache.org/jira/browse/HIVE-15881
>             Project: Hive
>          Issue Type: Task
>          Components: Query Planning
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>            Priority: Minor
>
> The Utilities class has two methods, {{getInputSummary}} and 
> {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} 
> to get the summary of a list of input locations in parallel. These methods 
> are Hive related, but the variable name does not look it is specific for Hive.
> Also, the above variable is not on HiveConf nor used anywhere else. I just 
> found a reference on the Hadoop MR1 code.
> I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, 
> and use a different variable name, such as 
> {{hive.get.input.listing.num.threads}}, that reflects the intention of the 
> variable. The removal of the old variable might happen on Hive 3.x



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to