[ https://issues.apache.org/jira/browse/HIVE-21160?focusedWorklogId=778105&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778105 ]
ASF GitHub Bot logged work on HIVE-21160: ----------------------------------------- Author: ASF GitHub Bot Created on: 03/Jun/22 14:26 Start Date: 03/Jun/22 14:26 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on code in PR #2855: URL: https://github.com/apache/hive/pull/2855#discussion_r888964933 ########## ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java: ########## @@ -264,9 +257,104 @@ private void reparseAndSuperAnalyze(ASTNode tree) throws SemanticException { } } - private void validateTxnManager(Table mTable) throws SemanticException { - if (!AcidUtils.acidTableWithoutTransactions(mTable) && !getTxnMgr().supportsAcid()) { - throw new SemanticException(ErrorMsg.ACID_OP_ON_NONACID_TXNMGR.getMsg()); + private void analyzeSplitUpdate(ASTNode tree, Table mTable, ASTNode tabNameNode) throws SemanticException { + operation = Context.Operation.UPDATE; + + List<? extends Node> children = tree.getChildren(); + + ASTNode where = null; + int whereIndex = 2; + if (children.size() > whereIndex) { + where = (ASTNode) children.get(whereIndex); + assert where.getToken().getType() == HiveParser.TOK_WHERE : + "Expected where clause, but found " + where.getName(); + } + + Set<String> setRCols = new LinkedHashSet<>(); +// TOK_UPDATE_TABLE +// TOK_TABNAME +// ... +// TOK_SET_COLUMNS_CLAUSE <- The set list from update should be the second child (index 1) + assert children.size() >= 2 : "Expected update token to have at least two children"; + ASTNode setClause = (ASTNode) children.get(1); + Map<String, ASTNode> setCols = collectSetColumnsAndExpressions(setClause, setRCols, mTable); + Map<Integer, ASTNode> setColExprs = new HashMap<>(setClause.getChildCount()); + + List<FieldSchema> nonPartCols = mTable.getCols(); + Map<String, String> colNameToDefaultConstraint = getColNameToDefaultValueMap(mTable); + List<String> values = new ArrayList<>(mTable.getCols().size()); + StringBuilder rewrittenQueryStr = createRewrittenQueryStrBuilder(); Review Comment: note: I always feeled that we need some `HiveSqlQueryBuilder` ; we seem to resort to some concatenator to build our own rewritten queries....it would have come handy here as well :D ########## ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java: ########## @@ -264,9 +257,104 @@ private void reparseAndSuperAnalyze(ASTNode tree) throws SemanticException { } } - private void validateTxnManager(Table mTable) throws SemanticException { - if (!AcidUtils.acidTableWithoutTransactions(mTable) && !getTxnMgr().supportsAcid()) { - throw new SemanticException(ErrorMsg.ACID_OP_ON_NONACID_TXNMGR.getMsg()); + private void analyzeSplitUpdate(ASTNode tree, Table mTable, ASTNode tabNameNode) throws SemanticException { + operation = Context.Operation.UPDATE; + + List<? extends Node> children = tree.getChildren(); + + ASTNode where = null; + int whereIndex = 2; + if (children.size() > whereIndex) { + where = (ASTNode) children.get(whereIndex); + assert where.getToken().getType() == HiveParser.TOK_WHERE : + "Expected where clause, but found " + where.getName(); + } + + Set<String> setRCols = new LinkedHashSet<>(); +// TOK_UPDATE_TABLE +// TOK_TABNAME +// ... +// TOK_SET_COLUMNS_CLAUSE <- The set list from update should be the second child (index 1) + assert children.size() >= 2 : "Expected update token to have at least two children"; + ASTNode setClause = (ASTNode) children.get(1); + Map<String, ASTNode> setCols = collectSetColumnsAndExpressions(setClause, setRCols, mTable); + Map<Integer, ASTNode> setColExprs = new HashMap<>(setClause.getChildCount()); + + List<FieldSchema> nonPartCols = mTable.getCols(); + Map<String, String> colNameToDefaultConstraint = getColNameToDefaultValueMap(mTable); + List<String> values = new ArrayList<>(mTable.getCols().size()); + StringBuilder rewrittenQueryStr = createRewrittenQueryStrBuilder(); + rewrittenQueryStr.append("(SELECT ROW__ID"); + for (int i = 0; i < nonPartCols.size(); i++) { + rewrittenQueryStr.append(','); + String name = nonPartCols.get(i).getName(); + ASTNode setCol = setCols.get(name); + String identifier = HiveUtils.unparseIdentifier(name, this.conf); + + if (setCol != null) { + if (setCol.getType() == HiveParser.TOK_TABLE_OR_COL && + setCol.getChildCount() == 1 && setCol.getChild(0).getType() == HiveParser.TOK_DEFAULT_VALUE) { + rewrittenQueryStr.append(colNameToDefaultConstraint.get(name)); Review Comment: I don't know in what form we are storing the constraint defaults; I think they are escaped...because they are run thru a `ParseDriver` - so this should be ok. do we have this path covered with some test? I haven't seen any.. I wonder if we need to take care of replacing the `DEFAULT` at this point - as the query will be parsed again...won't that be able to handle these defaults? ########## ql/src/test/results/clientpositive/llap/update_where_partitioned.q.out: ########## @@ -63,15 +63,20 @@ PREHOOK: type: QUERY PREHOOK: Input: default@acid_uwp PREHOOK: Input: default@acid_uwp@ds=today PREHOOK: Input: default@acid_uwp@ds=tomorrow +PREHOOK: Output: default@acid_uwp PREHOOK: Output: default@acid_uwp@ds=today PREHOOK: Output: default@acid_uwp@ds=tomorrow POSTHOOK: query: update acid_uwp set b = 'fred' where b = 'k17Am8uPHWk02cEf1jet' POSTHOOK: type: QUERY POSTHOOK: Input: default@acid_uwp POSTHOOK: Input: default@acid_uwp@ds=today POSTHOOK: Input: default@acid_uwp@ds=tomorrow +POSTHOOK: Output: default@acid_uwp +POSTHOOK: Output: default@acid_uwp@ds=today POSTHOOK: Output: default@acid_uwp@ds=today POSTHOOK: Output: default@acid_uwp@ds=tomorrow +POSTHOOK: Lineage: acid_uwp PARTITION(ds=today).a SIMPLE [(acid_uwp)acid_uwp.FieldSchema(name:a, type:int, comment:null), ] Review Comment: w.r.t to security stuff; this kinda means that we will be "reading" column `A` in case of an update of column `B`; I think an interesting question would be: should we support the case of creating ranger policies in which a user is allowed to update a column (b) in a table; but not allowed to read some other column (`a`) - I think right now we support this; should we keep doing that? I don't know how complicated would be to omit the columns we are touching only because of this rewrite ########## ql/src/test/results/clientnegative/authorization_update_noupdatepriv.q.out: ########## @@ -6,4 +6,4 @@ POSTHOOK: query: create table auth_noupd(i int, j int) clustered by (j) into 2 b POSTHOOK: type: CREATETABLE POSTHOOK: Output: database:default POSTHOOK: Output: default@auth_noupd -FAILED: HiveAccessControlException Permission denied: Principal [name=user1, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.auth_noupd], [UPDATE] on Object [type=TABLE_OR_VIEW, name=default.auth_noupd]] Review Comment: earlier it was checking for `UPDATE` + `SELECT` ; now it checks for `INSERT`+`SELECT`+`DELETE` this change might be unexpected to people working on authorization....not sure what we could do about it; but I think in its current form; depending on the feature toggle we will be checking: UPDATE/SELECT+DELETE - which could be unexpected ########## ql/src/test/results/clientpositive/llap/check_constraint.q.out: ########## @@ -2061,65 +2061,87 @@ PREHOOK: query: explain cbo update acid_uami_n0 set de = 893.14 where de = 103.0 PREHOOK: type: QUERY PREHOOK: Input: default@acid_uami_n0 PREHOOK: Output: default@acid_uami_n0 +PREHOOK: Output: default@acid_uami_n0 POSTHOOK: query: explain cbo update acid_uami_n0 set de = 893.14 where de = 103.00 or de = 119.00 POSTHOOK: type: QUERY POSTHOOK: Input: default@acid_uami_n0 POSTHOOK: Output: default@acid_uami_n0 +POSTHOOK: Output: default@acid_uami_n0 CBO PLAN: -HiveSortExchange(distribution=[any], collation=[[0]]) - HiveProject(row__id=[$5], i=[$0], _o__c2=[893.14:DECIMAL(5, 2)], vc=[$2]) - HiveFilter(condition=[AND(IN($1, 103:DECIMAL(3, 0), 119:DECIMAL(3, 0)), enforce_constraint(IS NOT FALSE(>=(893.14, CAST($0):DECIMAL(5, 2)))))]) Review Comment: it seems like we lost the `enforce_constraint` - is that expected? Issue Time Tracking ------------------- Worklog Id: (was: 778105) Time Spent: 4.5h (was: 4h 20m) > Rewrite Update statement as Multi-insert and do Update split early > ------------------------------------------------------------------ > > Key: HIVE-21160 > URL: https://issues.apache.org/jira/browse/HIVE-21160 > Project: Hive > Issue Type: Sub-task > Components: Transactions > Affects Versions: 3.0.0 > Reporter: Eugene Koifman > Assignee: Krisztian Kasa > Priority: Major > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)