[ https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gopal V updated HIVE-20419: --------------------------- Description: With ACID table, the format and schema layouts are much more strictly controlled - the table cannot be made of partial ORC and partial RCFile. This assumption can remove this loop and the slow check for schema between each partition before vectorizing the operators - the worst-case performance is the common & correct case, where all of them match. {code} HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: 621ms java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869 <7 recursive calls> java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, Object) HashMap.java:1989 java.util.HashMap.putVal(int, Object, Object, boolean, boolean) HashMap.java:637 java.util.HashMap.put(Object, Object) HashMap.java:611 org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc, VectorPartitionDesc, Map) Vectorizer.java:1272 org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc, boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323 org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork, String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) Vectorizer.java:1654 org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork, Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865 org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork, boolean) Vectorizer.java:1109 org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node, Stack, Object[]) Vectorizer.java:961 org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111 org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) TaskGraphWalker.java:180 org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, HashMap) TaskGraphWalker.java:125 org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext) Vectorizer.java:2442 org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, ParseContext, Context) TezCompiler.java:717 org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, HashSet, HashSet) TaskCompiler.java:258 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443 org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) CalcitePlanner.java:358 {code} was: With ACID table, the format and schema layouts are much more strictly controlled - the table cannot be made of partial ORC and partial RCFile. This assumption can remove this loop and the slow check for schema between each partition before vectorizing the operators - the worst-case performance is the common & correct case, where all of them match. > Vectorization: Speed up > VectorizationDispatcher.validateInputFormatAndSchemaEvolution() for ACIDv2 > -------------------------------------------------------------------------------------------------- > > Key: HIVE-20419 > URL: https://issues.apache.org/jira/browse/HIVE-20419 > Project: Hive > Issue Type: Bug > Components: Vectorization > Reporter: Gopal V > Priority: Major > > With ACID table, the format and schema layouts are much more strictly > controlled - the table cannot be made of partial ORC and partial RCFile. > This assumption can remove this loop and the slow check for schema between > each partition before vectorizing the operators - the worst-case performance > is the common & correct case, where all of them match. > {code} > HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: > 621ms > java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869 <7 > recursive calls> > java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, > Object) HashMap.java:1989 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) > HashMap.java:637 > java.util.HashMap.put(Object, Object) HashMap.java:611 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc, > VectorPartitionDesc, Map) Vectorizer.java:1272 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc, > boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork, > String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) > Vectorizer.java:1654 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork, > Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork, > boolean) Vectorizer.java:1109 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node, > Stack, Object[]) Vectorizer.java:961 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, > TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) > TaskGraphWalker.java:180 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, > HashMap) TaskGraphWalker.java:125 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext) > Vectorizer.java:2442 > org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, > ParseContext, Context) TezCompiler.java:717 > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, > HashSet, HashSet) TaskCompiler.java:258 > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, > SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443 > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) > CalcitePlanner.java:358 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)