[ https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864325#comment-15864325 ]
Thomas Poepping commented on HIVE-15881: ---------------------------------------- Spent a little more time looking into this, found the same thing as Sahil. I would like to suggest some edits: I would change the naming of the variable. As I said above, {{hive.get.input.listing.num.threads}} isn't exactly clear, and because this is used for two things ({{getInputSummary}} and {{getInputPaths}}) I would like to propose a different way of doing this. * Two configuration values, one for each usage: ** {{hive.exec.input.paths.num.threads}} for {{getInputPaths}} ** {{hive.exec.input.summary.num.threads}} for {{getInputSummary}} * Bonus is that these follow existing {{HiveConf}} patterns Other questions that should be asked: * what is the default value? ** I'm looking through {{HiveConf}}, but not seeing any other config value that might suffice. Maybe we just use 1? Also, I'm a fan of '0' meaning 'maximum number allowable' with these types of config values, but that can be up for discussion. We would need a way to figure out what "max allowable" actually _means_, either through another configuration value or based on the OS structure? I don't know. > Use new thread count variable name instead of mapred.dfsclient.parallelism.max > ------------------------------------------------------------------------------ > > Key: HIVE-15881 > URL: https://issues.apache.org/jira/browse/HIVE-15881 > Project: Hive > Issue Type: Task > Components: Query Planning > Reporter: Sergio Peña > Assignee: Sergio Peña > Priority: Minor > > The Utilities class has two methods, {{getInputSummary}} and > {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} > to get the summary of a list of input locations in parallel. These methods > are Hive related, but the variable name does not look it is specific for Hive. > Also, the above variable is not on HiveConf nor used anywhere else. I just > found a reference on the Hadoop MR1 code. > I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, > and use a different variable name, such as > {{hive.get.input.listing.num.threads}}, that reflects the intention of the > variable. The removal of the old variable might happen on Hive 3.x -- This message was sent by Atlassian JIRA (v6.3.15#6346)