Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22189 )

Change subject: IMPALA-13501: Clean up uncommitted Iceberg files after 
validation check failure
......................................................................


Patch Set 1:

(10 comments)

Thanks for the changes, Noemi! This is an important feature to have IMO

http://gerrit.cloudera.org:8080/#/c/22189/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/22189/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7561
PS1, Line 7561:         String debugAction = update.getDebug_action();
nit: would it fit into a single line if you merged this one with the one below?


http://gerrit.cloudera.org:8080/#/c/22189/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7564
PS1, Line 7564: ImpalaRuntimeException
> Do we need to catch ImpalaRuntimeException here? Can a validation error als
I think SnapshotProducer.commit() could also throw a CommitFailedException and 
the validate*() functions, also getting IOexception during the validation check 
could result in throwing and UncheckedIOException (that extends 
RuntimeException). I have the impression that we can get a lot of different 
exceptions here and since Iceberg uses RuntimeExceptions it's not that easy to 
track all of them down.
I think the safest approach is to catch all Exceptions here. At the end of the 
day it doesn't matter what failed the commit/validations, we have to clean up 
anyway.


http://gerrit.cloudera.org:8080/#/c/22189/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7585
PS1, Line 7585:     iceTxn.commitTransaction();
I'm a bit confused now how this works in general. Do we make the actual commit 
of the new snapshot to the table here? Do do we commit the new snapshot in 
L7565?
If this one can also throw exceptions then we don't do the cleanup here.


http://gerrit.cloudera.org:8080/#/c/22189/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7587
PS1, Line 7587:   }
I'm wondering if it makes sense to repro the "uncleaned files" scenario by not 
the debug actions but with actually setting up a test where multiple 
updates/deletes race against each other.


http://gerrit.cloudera.org:8080/#/c/22189/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/22189/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@605
PS1, Line 605:    *                  data and delete files which have to be 
cleaned up.
nit: this seems an odd indentation


http://gerrit.cloudera.org:8080/#/c/22189/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@621
PS1, Line 621:       deleteIfExists(path);
In theory deleteIfExists can throw an IOException. So if it does for some 
reason for a file at the middle of the list, we won't delete the rest. 
Wondering how likely this is, and if we should handle such cases.
I recall the validation*() functions in Iceberg could also fail with 
IOExceptions, so the same root cause might result in an IOException here too.


http://gerrit.cloudera.org:8080/#/c/22189/1/fe/src/main/java/org/apache/impala/util/DebugUtils.java
File fe/src/main/java/org/apache/impala/util/DebugUtils.java:

http://gerrit.cloudera.org:8080/#/c/22189/1/fe/src/main/java/org/apache/impala/util/DebugUtils.java@196
PS1, Line 196:               exceptionToThrow = new ValidationException(param);
Do I get it right that this is introduced to simulate when a new snapshot can't 
be committed due to conflicts with other parallel commits?
I think in that case Iceberg throws a CommitFailedException:
https://github.com/apache/iceberg/blob/ff813445916bfd6ec1cc30a02b02f8bade7a26f6/core/src/main/java/org/apache/iceberg/SnapshotProducer.java#L403


http://gerrit.cloudera.org:8080/#/c/22189/1/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/22189/1/tests/query_test/test_iceberg.py@2018
PS1, Line 2018:       result = str(err)
No need to introduce a 'result' variable here


http://gerrit.cloudera.org:8080/#/c/22189/1/tests/query_test/test_iceberg.py@2020
PS1, Line 2020:                                 "ValidationException: simulated 
validation check failure")
nit: indentation seems to be off compared to other line breaks


http://gerrit.cloudera.org:8080/#/c/22189/1/tests/query_test/test_iceberg.py@2032
PS1, Line 2032:       assert "Found 1 items" in files_result
I get that metadata file removal is performed by the Iceberg lib, but I think 
we can add a check for that too. With that we could catch if the library is 
broken after a version bump.



--
To view, visit http://gerrit.cloudera.org:8080/22189
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe59546ebf3c639b75b53dfa1daba37cef50eb21
Gerrit-Change-Number: 22189
Gerrit-PatchSet: 1
Gerrit-Owner: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 11 Dec 2024 08:46:40 +0000
Gerrit-HasComments: Yes

Reply via email to