[ https://issues.apache.org/jira/browse/HIVE-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15596364#comment-15596364 ]
Eugene Koifman edited comment on HIVE-15032 at 10/21/16 9:20 PM: ----------------------------------------------------------------- Some pointers from [~jcamachorodriguez] {noformat} Currently the partition pruning optimization is carried out in CBO, while partition column pruner removal (i.e. removal of parts of a Filter predicate that is not necessary because partition pruning kicked in) is still executed in Hive. I could not figure out what your question refers to, but I attach some pointers: - Partition column property is carried out in ExprNodeColumnDesc. Further, Table in org.apache.hadoop.hive.ql.metadata exposes information about the partition columns for each Table. - RelOptHiveTable contains logic to compute the necessary partition list for a given predicate. This logic relies on PartitionPruner, that can be found in org.apache.hadoop.hive.ql.optimizer.ppr. - Partition pruning in CBO in carried out by HivePartitionPruneRule, which relies on the aforementioned method in RelOptHiveTable. {noformat} {noformat} It seems to me that it will not be easy to do it directly from UpdateDeleteSemanticAnalyzer, as you are translating AST -> AST and thus the necessary aux structures to resolve the partitions for the query have not been created. However, given an INSERT query, if Calcite has optimized the plan, you would be able to obtain the partitions from SemanticAnalyzer and rewrite to avoid dynamic partition insert. In particular, you can obtain the list of partitions for a given query from prunedPartitions in SemanticAnalyzer. Once again, this is only valid if Calcite has optimized the plan, as otherwise prunedPartitions is not populated till the partition pruner optimization runs in Hive. {noformat} One simple way to fix this is rewrite the query 2 times. Update -> Insert and do SemanticAnalzyer.analyze() on it as is done currently in UpdateDeleteSemanticAnalzyer to create the structures Jesus is referring to and then generate another Insert with Static partition info in it (if any was found). This should be well worth it to eliminate DP and "fix" locking along the way. Another option is to look for results in SemanticAnalyzer of analyzing the WHERE clase and then modify the structures used in genFileSink() to supply static partition values. Effectively, this teaches the optimizer to recognize that the Insert statement is reading and writing the same partition. This is probably not common outside of Update/Delete/Merge. was (Author: ekoifman): Some pointers from [~jcamachorodriguez] {noformat} Currently the partition pruning optimization is carried out in CBO, while partition column pruner removal (i.e. removal of parts of a Filter predicate that is not necessary because partition pruning kicked in) is still executed in Hive. I could not figure out what your question refers to, but I attach some pointers: - Partition column property is carried out in ExprNodeColumnDesc. Further, Table in org.apache.hadoop.hive.ql.metadata exposes information about the partition columns for each Table. - RelOptHiveTable contains logic to compute the necessary partition list for a given predicate. This logic relies on PartitionPruner, that can be found in org.apache.hadoop.hive.ql.optimizer.ppr. - Partition pruning in CBO in carried out by HivePartitionPruneRule, which relies on the aforementioned method in RelOptHiveTable. {noformat} {noformat} It seems to me that it will not be easy to do it directly from UpdateDeleteSemanticAnalyzer, as you are translating AST -> AST and thus the necessary aux structures to resolve the partitions for the query have not been created. However, given an INSERT query, if Calcite has optimized the plan, you would be able to obtain the partitions from SemanticAnalyzer and rewrite to avoid dynamic partition insert. In particular, you can obtain the list of partitions for a given query from prunedPartitions in SemanticAnalyzer. Once again, this is only valid if Calcite has optimized the plan, as otherwise prunedPartitions is not populated till the partition pruner optimization runs in Hive. {noformat} One simple way to fix this is rewrite the query 2 times. Update -> Insert and do SemanticAnalzyer.analyze() on it as is done currently in UpdateDeleteSemanticAnalzyer to create the structures Jesus is referring to and then generate another Insert with Static partition info in it (if any was found). This should be well worth it to eliminate DP and "fix" locking along the way. Another option is to look for results in SemanticAnalyzer of analyzing the WHERE clase and then modify the structures used in genFileSink() to supply static partition values. Effectively, this teaches the optimizer to recognize that the Insert statement is reading and writing the same partition. This is probably not common outside of Update/Delete/Merge. > Update/Delete statements use dynamic partitions when it's not necessary > ----------------------------------------------------------------------- > > Key: HIVE-15032 > URL: https://issues.apache.org/jira/browse/HIVE-15032 > Project: Hive > Issue Type: Bug > Components: Transactions > Affects Versions: 1.0.0 > Reporter: Eugene Koifman > Assignee: Eugene Koifman > > {noformat} > create table if not exists TAB_PART (a int, b int) partitioned by (p string) > clustered by (a) into 2 buckets stored as orc TBLPROPERTIES > ('transactional'='true') > insert into TAB_PART partition(p='blah') values(1,2) //this uses static > part > update TAB_PART set b = 7 where p = 'blah' //this uses DP... WHY? > {noformat} > the Update is rewritten into an Insert stmt but > SemanticAnalzyer.genFileSink() for this Insert is set up with dynamic > partitions > at least in theory, we should be able to analyze the WHERE clause so that > Insert doesn't have to use DP. > Another important side effect of this is how locks are acquired. As it is, a > SHARED_WRITE is acquired on the TAB_PART table. > However it would suffice to acquire a SHARED_WRITE on the single partition > operated on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)