[jira] [Work logged] (HIVE-24854) Incremental Materialized view refresh in presence of update/delete operations

ASF GitHub Bot (Jira) Thu, 08 Apr 2021 20:39:08 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-24854?focusedWorklogId=579720&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579720
 ]


ASF GitHub Bot logged work on HIVE-24854:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Apr/21 03:38
            Start Date: 09/Apr/21 03:38
    Worklog Time Spent: 10m 
      Work Description: jcamachor commented on a change in pull request #2119:
URL: https://github.com/apache/hive/pull/2119#discussion_r610294598



##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveJoinIncrementalRewritingRule.java
##########
@@ -0,0 +1,86 @@
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.core.Union;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.sql.fun.SqlStdOperatorTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+
+import java.util.ArrayList;
+import java.util.List;
+
+public class HiveJoinIncrementalRewritingRule extends RelOptRule {

Review comment:
       Comment with explanation?

##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFieldTrimmerRule.java
##########
@@ -71,6 +74,11 @@ public void onMatch(RelOptRuleCall call) {
     triggered = true;
   }
 
+  protected boolean canGo(RelOptRuleCall call, RelNode node) {

Review comment:
       `canGo` -> `shouldRun` ?

##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveRowIsDeletedPropagator.java
##########
@@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeFactory;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexShuttle;
+import org.apache.calcite.sql.fun.SqlStdOperatorTable;
+import org.apache.calcite.sql.type.SqlTypeName;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.hadoop.hive.ql.metadata.VirtualColumn;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelShuttle;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelShuttleImpl;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveJoin;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * {@link HiveRelShuttle} to propagate rowIsDeleted column to all 
HiveRelNodes' rowType in the plan.
+ * General rule: we expect that the rowIsDeleted column is the last column in 
the input rowType of the current
+ * {@link 
org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveRelNode}.
+ */
+public class HiveRowIsDeletedPropagator extends HiveRelShuttleImpl {

Review comment:
       Isn't the propagator missing the Filter operator? Could we add an 
incremental maintenance test for a MV that has a query definition with a Filter 
on top of the TS with the deleted row?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -15280,7 +15283,8 @@ public Operator getSinkOp() {
     NONE,
     INSERT_OVERWRITE_REBUILD,
     AGGREGATE_REBUILD,
-    NO_AGGREGATE_REBUILD
+    NO_AGGREGATE_REBUILD,
+    JOIN_REBUILD

Review comment:
       Probably these names should be changed. A proposal:
   `AGGREGATE_REBUILD` -> `AGGREGATE_INSERT_REBUILD`
   `NO_AGGREGATE_REBUILD` -> `JOIN_INSERT_REBUILD`
   `JOIN_REBUILD` -> `JOIN_INSERT_DELETE_REBUILD`
   Thoughts?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##########
@@ -1286,6 +1289,181 @@ private void 
fixUpASTAggregateIncrementalRebuild(ASTNode newAST) throws Semantic
     ctx.addDestNamePrefix(2, Context.DestClausePrefix.INSERT);
   }
 
+  private void fixUpASTJoinIncrementalRebuild(ASTNode newAST) throws 
SemanticException {

Review comment:
       Maybe these related `fixUp*` methods should be moved outside of this 
class (`HiveMaterializedViewUtils`?). Feel free to do it in a follow-up JIRA 
though.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##########
@@ -5677,7 +5881,7 @@ private RexNode getRexNodeCached(ASTNode node, 
RowResolver input,
     for (ASTNode node : fieldDescList) {
       Map<ASTNode, String> map = translateFieldDesc(node);
       for (Entry<ASTNode, String> entry : map.entrySet()) {
-        unparseTranslator.addTranslation(entry.getKey(), entry.getValue());
+        unparseTranslator.addTranslation(entry.getKey(), 
entry.getValue().toLowerCase());

Review comment:
       Is this necessary / always correct? Can we add a comment with 
explanation?

##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableScan.java
##########
@@ -263,6 +279,10 @@ public boolean isInsideView() {
     return insideView;
   }
 
+  public boolean isFetchDeletedRows() {
+    return fetchDeletedRows;
+  }
+
   // We need to include isInsideView inside digest to differentiate direct
   // tables and tables inside view. Otherwise, Calcite will treat them as the 
same.
   // Also include partition list key to trigger cost evaluation even if an

Review comment:
       The `computeDigest` method should include `fetchDeletedRows` in the 
computed digest.
   Otherwise, if you would have a plan with two TS on the same table, one with 
`fetchDeletedRows` set to `true` and another one set to `false`, the planner 
may merge those TS operators and take a single one of them.
   Maybe you can try to reproduce it with a custom test and add it to the patch.

##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveRowIsDeletedPropagatorRule.java
##########
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Join;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.fun.SqlStdOperatorTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFieldTrimmerRule;
+
+/**
+ * Planner rule to propagate rowIsDeleted column in Materialized View 
incremental rebuild plans.
+ * This rule uses {@link HiveRowIsDeletedPropagator}.
+ * This rule should be applied on the uppermost Right outer join node.
+ */
+public class HiveRowIsDeletedPropagatorRule extends HiveFieldTrimmerRule {

Review comment:
       Is it an overkill that this rule extends `HiveFieldTrimmerRule` since it 
is unrelated? Should they both extend a common class that has the mechanism to 
unnest the plan and trigger a single time instead?
   
   I think an alternative that could be much simpler is to use the 
`HiveRowIsDeletedPropagator` uniquely (no rule) since it already implements a 
traversal logic. You could control the flow internally with a boolean that 
indicates whether you have already passed through the RIGHT join while doing 
the traversal. That would also avoid any changes in `HiveFieldTrimmerRule` and 
simplify the patch.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 579720)
    Time Spent: 0.5h  (was: 20m)

> Incremental Materialized view refresh in presence of update/delete operations
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-24854
>                 URL: https://issues.apache.org/jira/browse/HIVE-24854
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Current implementation of incremental Materialized can not be used if any of 
> the Materialized view source tables has update or delete operation since the 
> last rebuild. In such cases a full rebuild should be performed.
> Steps to enable incremental rebuild:
> 1. Introduce a new virtual column to mark a row deleted
> 2. Execute the query in the view definition 
> 2.a. Add filter to each table scan in order to pull only the rows from each 
> source table which has a higher writeId than the writeId of the last rebuild 
> - this is already implemented by current incremental rebuild
> 2.b Add row is deleted virtual column to each table scan. In join nodes if 
> any of the branches has a deleted row the result row is also deleted.
> We should distinguish two type of view definition queries: with and without 
> Aggregate.
> 3.a No aggregate path:
> Rewrite the plan of the full rebuild to a multi insert statement with two 
> insert branches. One branch to insert new rows into the materialized view 
> table and the second one for insert deleted rows to the materialized view 
> delete delta.
> 3.b Aggregate path: TBD
> Prerequisite:
> source tables haven't compacted since the last MV revuild



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24854) Incremental Materialized view refresh in presence of update/delete operations

Reply via email to