[jira] [Updated] (HIVE-20419) Vectorization: Speed up VectorizationDispatcher.validateInputFormatAndSchemaEvolution() for ACIDv2

Gopal V (JIRA) Fri, 17 Aug 2018 20:05:31 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gopal V updated HIVE-20419:
---------------------------
    Description: 
With ACID table, the format and schema layouts are much more strictly 
controlled - the table cannot be made of partial ORC and partial RCFile.

This assumption can remove this loop and the slow check for schema between each 
partition before vectorizing the operators - the worst-case performance is the 
common & correct case, where all of them match.

{code}
HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: 
621ms
java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869  <7 
recursive calls>
java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, 
Object) HashMap.java:1989
java.util.HashMap.putVal(int, Object, Object, boolean, boolean) HashMap.java:637
java.util.HashMap.put(Object, Object) HashMap.java:611
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc,
 VectorPartitionDesc, Map) Vectorizer.java:1272
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc,
 boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork,
 String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) 
Vectorizer.java:1654
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork,
 Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork,
 boolean) Vectorizer.java:1109
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node,
 Stack, Object[]) Vectorizer.java:961
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, 
TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) 
TaskGraphWalker.java:180
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, HashMap) 
TaskGraphWalker.java:125
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext)
 Vectorizer.java:2442
org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, 
ParseContext, Context) TezCompiler.java:717
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, 
HashSet, HashSet) TaskCompiler.java:258
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, 
SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) 
CalcitePlanner.java:358
{code}

  was:
With ACID table, the format and schema layouts are much more strictly 
controlled - the table cannot be made of partial ORC and partial RCFile.

This assumption can remove this loop and the slow check for schema between each 
partition before vectorizing the operators - the worst-case performance is the 
common & correct case, where all of them match.


> Vectorization: Speed up 
> VectorizationDispatcher.validateInputFormatAndSchemaEvolution() for ACIDv2
> --------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-20419
>                 URL: https://issues.apache.org/jira/browse/HIVE-20419
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>            Reporter: Gopal V
>            Priority: Major
>
> With ACID table, the format and schema layouts are much more strictly 
> controlled - the table cannot be made of partial ORC and partial RCFile.
> This assumption can remove this loop and the slow check for schema between 
> each partition before vectorizing the operators - the worst-case performance 
> is the common & correct case, where all of them match.
> {code}
> HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: 
> 621ms
> java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869  <7 
> recursive calls>
> java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, 
> Object) HashMap.java:1989
> java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 
> HashMap.java:637
> java.util.HashMap.put(Object, Object) HashMap.java:611
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc,
>  VectorPartitionDesc, Map) Vectorizer.java:1272
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc,
>  boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork,
>  String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) 
> Vectorizer.java:1654
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork,
>  Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork,
>  boolean) Vectorizer.java:1109
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node,
>  Stack, Object[]) Vectorizer.java:961
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, 
> TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) 
> TaskGraphWalker.java:180
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, 
> HashMap) TaskGraphWalker.java:125
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext)
>  Vectorizer.java:2442
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, 
> ParseContext, Context) TezCompiler.java:717
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, 
> HashSet, HashSet) TaskCompiler.java:258
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, 
> SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) 
> CalcitePlanner.java:358
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20419) Vectorization: Speed up VectorizationDispatcher.validateInputFormatAndSchemaEvolution() for ACIDv2

Reply via email to