Hi all, I'm working on HIVE-5595 to add vectorization support for SMB join operators. The problem I'm facing is that the vectorized record readers (eg. VectorizedOrcRecordReader) have a dependency on the MapWork.pathToPartitionInfo (see VectorizedRowBatchCtx.init).
What I discovered though is that for SMB join plans, this map (along with the related pathToAliases map) is incomplete. During the population, which occurs in GenMapRedUtils.setTaskPlan, the aliasToPartnInfo gets always populated: plan.getAliasToPartnInfo().put(alias_id, aliasPartnDesc); but the pathToAliases and pathToPartitionInfo maps are skipped for local case: if (!local) { while (iterPath.hasNext()) { ... plan.getPathToAliases().get(path).add(alias_id); plan.getPathToPartitionInfo().put(path, prtDesc); ... And local in this case, for the 'small' alias, is true, being set up on the call stack by MapJoinFactory$TableScanMapJoinProcessor.process: boolean local = pos != mapJoin.getConf().getPosBigTable(); if (oldTask == null) { assert currPlan.getReduceWork() == null; initMapJoinPlan(mapJoin, currTask, ctx, local); My question is towards SMB/MapJoin experts for clarification on this anomaly. SMB join is not local, but is treated as local. The resulted plan info has these anomalies, aforementioned maps are incomplete. Is the local-=true intentional in the SMB case, or is just leftover from the original MapJoin implementation? Should SMB join set it to false, or will the sky collapse? I can think of several 'workarounds', but there is too much context here that I don't have a strong grok on. Relevant stack: GenMapRedUtils.setTaskPlan(String, Operator<OperatorDesc>, Task<?>, boolean, GenMRProcContext, PrunedPartitionList) line: 658 GenMapRedUtils.setTaskPlan(String, Operator<OperatorDesc>, Task<?>, boolean, GenMRProcContext) line: 400 MapJoinFactory$TableScanMapJoinProcessor.initMapJoinPlan(AbstractMapJoinOperator<MapJoinDesc>, Task<Serializable>, GenMRProcContext, boolean) line: 157 MapJoinFactory$TableScanMapJoinProcessor.process(Node, Stack<Node>, NodeProcessorCtx, Object...) line: 219 DefaultRuleDispatcher.dispatch(Node, Stack<Node>, Object...) line: 90 GenMapRedWalker(DefaultGraphWalker).dispatchAndReturn(Node, Stack<Node>) line: 94 GenMapRedWalker.walk(Node) line: 54 GenMapRedWalker.walk(Node) line: 65 GenMapRedWalker.walk(Node) line: 65 GenMapRedWalker(DefaultGraphWalker).startWalking(Collection<Node>, HashMap<Node,Object>) line: 109 MapReduceCompiler.compile(ParseContext, List<Task<Serializable>>, HashSet<ReadEntity>, HashSet<WriteEntity>) line: 267 SemanticAnalyzer.analyzeInternal(ASTNode) line: 8927 Thanks, ~Remus