Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/22189 )
Change subject: IMPALA-13501: Clean up uncommitted Iceberg files after validation check failure ...................................................................... Patch Set 1: (10 comments) Thanks for the changes, Noemi! This is an important feature to have IMO http://gerrit.cloudera.org:8080/#/c/22189/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/22189/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7561 PS1, Line 7561: String debugAction = update.getDebug_action(); nit: would it fit into a single line if you merged this one with the one below? http://gerrit.cloudera.org:8080/#/c/22189/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7564 PS1, Line 7564: ImpalaRuntimeException > Do we need to catch ImpalaRuntimeException here? Can a validation error als I think SnapshotProducer.commit() could also throw a CommitFailedException and the validate*() functions, also getting IOexception during the validation check could result in throwing and UncheckedIOException (that extends RuntimeException). I have the impression that we can get a lot of different exceptions here and since Iceberg uses RuntimeExceptions it's not that easy to track all of them down. I think the safest approach is to catch all Exceptions here. At the end of the day it doesn't matter what failed the commit/validations, we have to clean up anyway. http://gerrit.cloudera.org:8080/#/c/22189/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7585 PS1, Line 7585: iceTxn.commitTransaction(); I'm a bit confused now how this works in general. Do we make the actual commit of the new snapshot to the table here? Do do we commit the new snapshot in L7565? If this one can also throw exceptions then we don't do the cleanup here. http://gerrit.cloudera.org:8080/#/c/22189/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7587 PS1, Line 7587: } I'm wondering if it makes sense to repro the "uncleaned files" scenario by not the debug actions but with actually setting up a test where multiple updates/deletes race against each other. http://gerrit.cloudera.org:8080/#/c/22189/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/22189/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@605 PS1, Line 605: * data and delete files which have to be cleaned up. nit: this seems an odd indentation http://gerrit.cloudera.org:8080/#/c/22189/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@621 PS1, Line 621: deleteIfExists(path); In theory deleteIfExists can throw an IOException. So if it does for some reason for a file at the middle of the list, we won't delete the rest. Wondering how likely this is, and if we should handle such cases. I recall the validation*() functions in Iceberg could also fail with IOExceptions, so the same root cause might result in an IOException here too. http://gerrit.cloudera.org:8080/#/c/22189/1/fe/src/main/java/org/apache/impala/util/DebugUtils.java File fe/src/main/java/org/apache/impala/util/DebugUtils.java: http://gerrit.cloudera.org:8080/#/c/22189/1/fe/src/main/java/org/apache/impala/util/DebugUtils.java@196 PS1, Line 196: exceptionToThrow = new ValidationException(param); Do I get it right that this is introduced to simulate when a new snapshot can't be committed due to conflicts with other parallel commits? I think in that case Iceberg throws a CommitFailedException: https://github.com/apache/iceberg/blob/ff813445916bfd6ec1cc30a02b02f8bade7a26f6/core/src/main/java/org/apache/iceberg/SnapshotProducer.java#L403 http://gerrit.cloudera.org:8080/#/c/22189/1/tests/query_test/test_iceberg.py File tests/query_test/test_iceberg.py: http://gerrit.cloudera.org:8080/#/c/22189/1/tests/query_test/test_iceberg.py@2018 PS1, Line 2018: result = str(err) No need to introduce a 'result' variable here http://gerrit.cloudera.org:8080/#/c/22189/1/tests/query_test/test_iceberg.py@2020 PS1, Line 2020: "ValidationException: simulated validation check failure") nit: indentation seems to be off compared to other line breaks http://gerrit.cloudera.org:8080/#/c/22189/1/tests/query_test/test_iceberg.py@2032 PS1, Line 2032: assert "Found 1 items" in files_result I get that metadata file removal is performed by the Iceberg lib, but I think we can add a check for that too. With that we could catch if the library is broken after a version bump. -- To view, visit http://gerrit.cloudera.org:8080/22189 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibe59546ebf3c639b75b53dfa1daba37cef50eb21 Gerrit-Change-Number: 22189 Gerrit-PatchSet: 1 Gerrit-Owner: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 11 Dec 2024 08:46:40 +0000 Gerrit-HasComments: Yes
