BELUGA BEHR created HIVE-16758: ---------------------------------- Summary: Better Select Number of Replications Key: HIVE-16758 URL: https://issues.apache.org/jira/browse/HIVE-16758 Project: Hive Issue Type: Improvement Reporter: BELUGA BEHR Priority: Minor
{{org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java}} We should be smarter about how we pick a replication number. We should add a new configuration equivalent to {{mapreduce.client.submit.file.replication}}. This value should be around the square root of the number of nodes and not hard-coded in the code. {code} public static final String DFS_REPLICATION_MAX = "dfs.replication.max"; private int minReplication = 10; @Override protected void initializeOp(Configuration hconf) throws HiveException { ... int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication); // minReplication value should not cross the value of dfs.replication.max minReplication = Math.min(minReplication, dfsMaxReplication); } {code} https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml -- This message was sent by Atlassian JIRA (v6.3.15#6346)