Gopal V created HIVE-12491: ------------------------------ Summary: Statistics: 3 attribute join on a 2-source table is off Key: HIVE-12491 URL: https://issues.apache.org/jira/browse/HIVE-12491 Project: Hive Issue Type: Bug Reporter: Gopal V Assignee: Prasanth Jayachandran
The eased out denominator has to detect duplicate row-stats from different attributes. {code} private Long getEasedOutDenominator(List<Long> distinctVals) { // Exponential back-off for NDVs. // 1) Descending order sort of NDVs // 2) denominator = NDV1 * (NDV2 ^ (1/2)) * (NDV3 ^ (1/4))) * .... Collections.sort(distinctVals, Collections.reverseOrder()); long denom = distinctVals.get(0); for (int i = 1; i < distinctVals.size(); i++) { denom = (long) (denom * Math.pow(distinctVals.get(i), 1.0 / (1 << i))); } return denom; } {code} This gets {{[8007986, 821974390, 821974390]}}, which is actually 3 columns 2 of which are from the RHS table. So the eased out denominator is off by a factor of 30,000 or so, causing OOMs in map-joins. -- This message was sent by Atlassian JIRA (v6.3.4#6332)