[ https://issues.apache.org/jira/browse/HIVE-13096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159948#comment-15159948 ]
Ashutosh Chauhan commented on HIVE-13096: ----------------------------------------- Instead of recursively adding cardinality of tree for each operator, I think following heuristic might be better: {code} getCummCardinality (Operator op) { if (op.type = join) { cummCardinality += maxCardinality from all branches; } else { return cummCardinality; } } {code} That is to say, cardinality from any operator other than join does not contribute in cumulative cardinality. And for join, max cardinality from its input contribute in cummulative cardinality of tree. This is akin to what we have on logical side, where getCumulativeCost() is overriden only for join and is overriden in manner suggested here. > Cost to choose side table in MapJoin conversion based on cumulative > cardinality > ------------------------------------------------------------------------------- > > Key: HIVE-13096 > URL: https://issues.apache.org/jira/browse/HIVE-13096 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer > Affects Versions: 2.0.0, 2.1.0 > Reporter: Jesus Camacho Rodriguez > Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13096.01.patch, HIVE-13096.patch > > > HIVE-11954 changed the logic to choose the side table in the MapJoin > conversion algorithm. Initial heuristic for the cost was based on number of > heavyweight operators. > This extends that work so the heuristic is based on accumulate cardinality. > In the future, we should choose the side based on total latency for the input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)