[ https://issues.apache.org/jira/browse/HIVE-17124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gopal V reassigned HIVE-17124: ------------------------------ Assignee: Gopal V > PlanUtils: Rand() is not a failure-tolerant distribution column > --------------------------------------------------------------- > > Key: HIVE-17124 > URL: https://issues.apache.org/jira/browse/HIVE-17124 > Project: Hive > Issue Type: Bug > Components: Query Planning > Affects Versions: 2.3.0, 3.0.0 > Reporter: Gopal V > Assignee: Gopal V > Attachments: HIVE-17124.1.patch > > > {code} > else { > // numPartitionFields = -1 means random partitioning > > partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand")); > } > {code} > This causes known data corruption during failure tolerance operations. > There is a failure tolerant distribution function inside ReduceSinkOperator, > which kicks in automatically when using no partition columns > {code} > if (partitionEval.length == 0) { > // If no partition cols, just distribute the data uniformly > // to provide better load balance. If the requirement is to have a > single reducer, we should > // set the number of reducers to 1. Use a constant seed to make the > code deterministic. > if (random == null) { > random = new Random(12345); > } > keyHashCode = random.nextInt(); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)