[jira] [Work logged] (HIVE-21160) Rewrite Update statement as Multi-insert and do Update split early

ASF GitHub Bot (Jira) Fri, 03 Jun 2022 07:27:16 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-21160?focusedWorklogId=778105&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778105
 ]


ASF GitHub Bot logged work on HIVE-21160:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Jun/22 14:26
            Start Date: 03/Jun/22 14:26
    Worklog Time Spent: 10m 
      Work Description: kgyrtkirk commented on code in PR #2855:
URL: https://github.com/apache/hive/pull/2855#discussion_r888964933


##########
ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java:
##########
@@ -264,9 +257,104 @@ private void reparseAndSuperAnalyze(ASTNode tree) throws 
SemanticException {
     }
   }
 
-  private void validateTxnManager(Table mTable) throws SemanticException {
-    if (!AcidUtils.acidTableWithoutTransactions(mTable) && 
!getTxnMgr().supportsAcid()) {
-      throw new SemanticException(ErrorMsg.ACID_OP_ON_NONACID_TXNMGR.getMsg());
+  private void analyzeSplitUpdate(ASTNode tree, Table mTable, ASTNode 
tabNameNode) throws SemanticException {
+    operation = Context.Operation.UPDATE;
+
+    List<? extends Node> children = tree.getChildren();
+
+    ASTNode where = null;
+    int whereIndex = 2;
+    if (children.size() > whereIndex) {
+      where = (ASTNode) children.get(whereIndex);
+      assert where.getToken().getType() == HiveParser.TOK_WHERE :
+              "Expected where clause, but found " + where.getName();
+    }
+
+    Set<String> setRCols = new LinkedHashSet<>();
+//    TOK_UPDATE_TABLE
+//            TOK_TABNAME
+//               ...
+//            TOK_SET_COLUMNS_CLAUSE <- The set list from update should be the 
second child (index 1)
+    assert children.size() >= 2 : "Expected update token to have at least two 
children";
+    ASTNode setClause = (ASTNode) children.get(1);
+    Map<String, ASTNode> setCols = collectSetColumnsAndExpressions(setClause, 
setRCols, mTable);
+    Map<Integer, ASTNode> setColExprs = new 
HashMap<>(setClause.getChildCount());
+
+    List<FieldSchema> nonPartCols = mTable.getCols();
+    Map<String, String> colNameToDefaultConstraint = 
getColNameToDefaultValueMap(mTable);
+    List<String> values = new ArrayList<>(mTable.getCols().size());
+    StringBuilder rewrittenQueryStr = createRewrittenQueryStrBuilder();

Review Comment:
   note: I always feeled that we need some `HiveSqlQueryBuilder` ; we seem to 
resort to some concatenator to build our own rewritten queries....it would have 
come handy here as well :D
   



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java:
##########
@@ -264,9 +257,104 @@ private void reparseAndSuperAnalyze(ASTNode tree) throws 
SemanticException {
     }
   }
 
-  private void validateTxnManager(Table mTable) throws SemanticException {
-    if (!AcidUtils.acidTableWithoutTransactions(mTable) && 
!getTxnMgr().supportsAcid()) {
-      throw new SemanticException(ErrorMsg.ACID_OP_ON_NONACID_TXNMGR.getMsg());
+  private void analyzeSplitUpdate(ASTNode tree, Table mTable, ASTNode 
tabNameNode) throws SemanticException {
+    operation = Context.Operation.UPDATE;
+
+    List<? extends Node> children = tree.getChildren();
+
+    ASTNode where = null;
+    int whereIndex = 2;
+    if (children.size() > whereIndex) {
+      where = (ASTNode) children.get(whereIndex);
+      assert where.getToken().getType() == HiveParser.TOK_WHERE :
+              "Expected where clause, but found " + where.getName();
+    }
+
+    Set<String> setRCols = new LinkedHashSet<>();
+//    TOK_UPDATE_TABLE
+//            TOK_TABNAME
+//               ...
+//            TOK_SET_COLUMNS_CLAUSE <- The set list from update should be the 
second child (index 1)
+    assert children.size() >= 2 : "Expected update token to have at least two 
children";
+    ASTNode setClause = (ASTNode) children.get(1);
+    Map<String, ASTNode> setCols = collectSetColumnsAndExpressions(setClause, 
setRCols, mTable);
+    Map<Integer, ASTNode> setColExprs = new 
HashMap<>(setClause.getChildCount());
+
+    List<FieldSchema> nonPartCols = mTable.getCols();
+    Map<String, String> colNameToDefaultConstraint = 
getColNameToDefaultValueMap(mTable);
+    List<String> values = new ArrayList<>(mTable.getCols().size());
+    StringBuilder rewrittenQueryStr = createRewrittenQueryStrBuilder();
+    rewrittenQueryStr.append("(SELECT ROW__ID");
+    for (int i = 0; i < nonPartCols.size(); i++) {
+      rewrittenQueryStr.append(',');
+      String name = nonPartCols.get(i).getName();
+      ASTNode setCol = setCols.get(name);
+      String identifier = HiveUtils.unparseIdentifier(name, this.conf);
+
+      if (setCol != null) {
+        if (setCol.getType() == HiveParser.TOK_TABLE_OR_COL &&
+                setCol.getChildCount() == 1 && setCol.getChild(0).getType() == 
HiveParser.TOK_DEFAULT_VALUE) {
+          rewrittenQueryStr.append(colNameToDefaultConstraint.get(name));

Review Comment:
   I don't know in what form we are storing the constraint defaults; I think 
they are escaped...because they are run thru a `ParseDriver` - so this should 
be ok.
   
   do we have this path covered with some test? I haven't seen any..
   
   I wonder if we need to take care of replacing the `DEFAULT` at this point - 
as the query will be parsed again...won't that be able to handle these defaults?



##########
ql/src/test/results/clientpositive/llap/update_where_partitioned.q.out:
##########
@@ -63,15 +63,20 @@ PREHOOK: type: QUERY
 PREHOOK: Input: default@acid_uwp
 PREHOOK: Input: default@acid_uwp@ds=today
 PREHOOK: Input: default@acid_uwp@ds=tomorrow
+PREHOOK: Output: default@acid_uwp
 PREHOOK: Output: default@acid_uwp@ds=today
 PREHOOK: Output: default@acid_uwp@ds=tomorrow
 POSTHOOK: query: update acid_uwp set b = 'fred' where b = 
'k17Am8uPHWk02cEf1jet'
 POSTHOOK: type: QUERY
 POSTHOOK: Input: default@acid_uwp
 POSTHOOK: Input: default@acid_uwp@ds=today
 POSTHOOK: Input: default@acid_uwp@ds=tomorrow
+POSTHOOK: Output: default@acid_uwp
+POSTHOOK: Output: default@acid_uwp@ds=today
 POSTHOOK: Output: default@acid_uwp@ds=today
 POSTHOOK: Output: default@acid_uwp@ds=tomorrow
+POSTHOOK: Lineage: acid_uwp PARTITION(ds=today).a SIMPLE 
[(acid_uwp)acid_uwp.FieldSchema(name:a, type:int, comment:null), ]

Review Comment:
   w.r.t to security stuff; this kinda means that we will be "reading" column 
`A` in case of an update of column `B`; I think an interesting question would 
be:
   should we support the case of creating ranger policies in which a user is 
allowed to update a column (b) in a table; but not allowed to read some other 
column (`a`) - I think right now we support this; should we keep doing that? I 
don't know how complicated would be to omit the columns we are touching only 
because of this rewrite
   
   



##########
ql/src/test/results/clientnegative/authorization_update_noupdatepriv.q.out:
##########
@@ -6,4 +6,4 @@ POSTHOOK: query: create table auth_noupd(i int, j int) 
clustered by (j) into 2 b
 POSTHOOK: type: CREATETABLE
 POSTHOOK: Output: database:default
 POSTHOOK: Output: default@auth_noupd
-FAILED: HiveAccessControlException Permission denied: Principal [name=user1, 
type=USER] does not have following privileges for operation QUERY [[SELECT] on 
Object [type=TABLE_OR_VIEW, name=default.auth_noupd], [UPDATE] on Object 
[type=TABLE_OR_VIEW, name=default.auth_noupd]]

Review Comment:
   earlier it was checking for `UPDATE` + `SELECT` ; now it checks for 
`INSERT`+`SELECT`+`DELETE` 
   
   this change might be unexpected to people working on authorization....not 
sure what we could do about it;
   but I think in its current form; depending on the feature toggle we will be 
checking: UPDATE/SELECT+DELETE - which could be unexpected



##########
ql/src/test/results/clientpositive/llap/check_constraint.q.out:
##########
@@ -2061,65 +2061,87 @@ PREHOOK: query: explain cbo update acid_uami_n0 set de 
= 893.14 where de = 103.0
 PREHOOK: type: QUERY
 PREHOOK: Input: default@acid_uami_n0
 PREHOOK: Output: default@acid_uami_n0
+PREHOOK: Output: default@acid_uami_n0
 POSTHOOK: query: explain cbo update acid_uami_n0 set de = 893.14 where de = 
103.00 or de = 119.00
 POSTHOOK: type: QUERY
 POSTHOOK: Input: default@acid_uami_n0
 POSTHOOK: Output: default@acid_uami_n0
+POSTHOOK: Output: default@acid_uami_n0
 CBO PLAN:
-HiveSortExchange(distribution=[any], collation=[[0]])
-  HiveProject(row__id=[$5], i=[$0], _o__c2=[893.14:DECIMAL(5, 2)], vc=[$2])
-    HiveFilter(condition=[AND(IN($1, 103:DECIMAL(3, 0), 119:DECIMAL(3, 0)), 
enforce_constraint(IS NOT FALSE(>=(893.14, CAST($0):DECIMAL(5, 2)))))])

Review Comment:
   it seems like we lost the `enforce_constraint` - is that expected?





Issue Time Tracking
-------------------

    Worklog Id:     (was: 778105)
    Time Spent: 4.5h  (was: 4h 20m)

> Rewrite Update statement as Multi-insert and do Update split early
> ------------------------------------------------------------------
>
>                 Key: HIVE-21160
>                 URL: https://issues.apache.org/jira/browse/HIVE-21160
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Transactions
>    Affects Versions: 3.0.0
>            Reporter: Eugene Koifman
>            Assignee: Krisztian Kasa
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-21160) Rewrite Update statement as Multi-insert and do Update split early

Reply via email to