[jira] [Work logged] (HIVE-25941) Long compilation time of complex query due to analysis for materialized view rewrite

ASF GitHub Bot (Jira) Mon, 11 Apr 2022 06:07:05 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-25941?focusedWorklogId=755214&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755214
 ]


ASF GitHub Bot logged work on HIVE-25941:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Apr/22 13:06
            Start Date: 11/Apr/22 13:06
    Worklog Time Spent: 10m 
      Work Description: kasakrisz commented on code in PR #3014:
URL: https://github.com/apache/hive/pull/3014#discussion_r847305506


##########
ql/src/java/org/apache/hadoop/hive/ql/metadata/MaterializedViewsCache.java:
##########
@@ -205,4 +212,52 @@ HiveRelOptMaterialization get(String dbName, String 
viewName) {
   public boolean isEmpty() {
     return materializedViews.isEmpty();
   }
+
+
+  private static class ASTKey {
+    private final ASTNode root;
+
+    public ASTKey(ASTNode root) {
+      this.root = root;
+    }
+
+    @Override
+    public boolean equals(Object o) {
+      if (this == o) return true;
+      if (o == null || getClass() != o.getClass()) return false;
+      ASTKey that = (ASTKey) o;
+      return equals(root, that.root);
+    }
+
+    private boolean equals(ASTNode astNode1, ASTNode astNode2) {
+      if (!(astNode1.getType() == astNode2.getType() &&
+              astNode1.getText().equals(astNode2.getText()) &&
+              astNode1.getChildCount() == astNode2.getChildCount())) {
+        return false;
+      }
+
+      for (int i = 0; i < astNode1.getChildCount(); ++i) {
+        if (!equals((ASTNode) astNode1.getChild(i), (ASTNode) 
astNode2.getChild(i))) {
+          return false;
+        }
+      }
+
+      return true;
+    }
+
+    @Override
+    public int hashCode() {
+      return hashcode(root);

Review Comment:
   * Hashcode of the ASTs stored in the `MaterializedViewCache` calculated only 
once: when the MVs are loaded when hs2 starts or a new MV is created because 
Java hashmap implementation caches the key's hashcode.
   * When we look-up a Materialization the hashcode of the key is calculated 
every time the get method is called. This is called only once for the entire 
tree per query.
   * To find sub-query rewrites the look-up is done by sub AST-s and the 
hashcode is also calculated for the subTrees but when I did some performance 
tests locally I didn't found this as a bottleneck.
   
   This solution is still much faster then generating the expanded query text 
of every possible sub-query using `UnparseTranslator` and `TokenRewriteStream`.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 755214)
    Time Spent: 1h 20m  (was: 1h 10m)

> Long compilation time of complex query due to analysis for materialized view 
> rewrite
> ------------------------------------------------------------------------------------
>
>                 Key: HIVE-25941
>                 URL: https://issues.apache.org/jira/browse/HIVE-25941
>             Project: Hive
>          Issue Type: Bug
>          Components: Materialized views
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: sample.png
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When compiling query the optimizer tries to rewrite the query plan or 
> subtrees of the plan to use materialized view scans.
> If
> {code}
> set hive.materializedview.rewriting.sql.subquery=false;
> {code}
> the compilation succeed in less then 10 sec otherwise it takes several 
> minutes (~ 5min) depending on the hardware.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25941) Long compilation time of complex query due to analysis for materialized view rewrite

Reply via email to