Noemi Pap-Takacs has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/22407 )

Change subject: IMPALA-12588: Don't UPDATE rows that already have the desired 
value
......................................................................

IMPALA-12588: Don't UPDATE rows that already have the desired value

When UPDATEing an Iceberg or Kudu table, we should change as few rows
as possible. In case of Iceberg tables it means writing as few new
data records and delete records as possible.
Therefore, if rows already have the new values we should just ignore
them. One way to achieve this is to add extra predicates, e.g.:

  UPDATE tbl SET k = 3 WHERE i > 4;
    ==>
  UPDATE tbl SET k = 3 WHERE i > 4 AND k IS DISTINCT FROM 3;

So we won't write new data/delete records for the rows that already have
the desired value.

Explanation on how to create extra predicates to filter out these rows:

If there are multiple assignments in the SET list, we can only skip
updating a row if all the mentioned values are already equal.
If either of the values needs to be updated, the entire row does.
Therefore we can think of the SET list as predicates connected with AND
and all of them need to be taken into consideration.
To negate this SET list, we have to negate the individual SET
assignments and connect them with OR.
Then add this new compound predicate to the original where predicates
with an AND (if there were none, just create a WHERE predicate from it).

                AND
              /     \
      original        OR
 WHERE predicate    /    \
                  !a       OR
                         /    \
                       !b     !c

This simple graph illustrates how the where predicate is rewritten.
(Considering an UPDATE statement that sets 3 columns.)
'!a', '!b' and '!c' are the negations of the individual assignments in
the SET list. So the extended WHERE predicate is:
(original WHERE predicate) AND (!a OR !b OR !c)
To handle NULL values correctly, we use IS DISTINCT FROM instead of
simply negating the assignment with operator '!='.

If the assignments contain UDFs, the result might be inconsistent
because of possible non-deterministic values or state in the UDFs,
therefore we should not rewrite the WHERE predicate at all.

Evaluating expressions can be expensive, therefore this optimization
can be limited or switched off entirely using the Query Option
SKIP_UNNEEDED_UPDATES_COL_LIMIT. By default, there is no filtering
if more than 10 assignments are in the SET list.

Testing:
 - Analysis
 - Planner
 - E2E
 - Kudu
 - Iceberg
 - testing the new query option: SKIP_UNNEEDED_UPDATES_COL_LIMIT
Change-Id: I926c80e8110de5a4615a3624a81a330f54317c8b
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/ModifyImpl.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/UpdateStmt.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-update.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-dml-with-utc-conversion.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_update.test
M tests/query_test/test_iceberg.py
13 files changed, 398 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/22407/7
--
To view, visit http://gerrit.cloudera.org:8080/22407
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I926c80e8110de5a4615a3624a81a330f54317c8b
Gerrit-Change-Number: 22407
Gerrit-PatchSet: 7
Gerrit-Owner: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Steve Carlin <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>

Reply via email to