Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20753 )

Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg 
tables
......................................................................


Patch Set 10:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/20753/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20753/8//COMMIT_MSG@14
PS8, Line 14: data sequence number of a delete file has to
            : be greater than the data sequence number of the data file being
            : investigated
> nit: improvement idea for the future:
This optimization sounds reasonable for me. created IMPALA-12649 for this. 
(copy-pasted your explanation to the ticket:) )


http://gerrit.cloudera.org:8080/#/c/20753/8//COMMIT_MSG@39
PS8, Line 39: hav
> have
Done


http://gerrit.cloudera.org:8080/#/c/20753/8//COMMIT_MSG@45
PS8, Line 45: table
> nit: tables
Done


http://gerrit.cloudera.org:8080/#/c/20753/8/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/20753/8/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@504
PS8, Line 504:               equalityDeleteFiles_.add(delFileDesc.first);
             :               addEqualityIds(delFile.equalityFieldIds());
             :             } else {
             :               Preconditions.checkState(delFile.content() == 
FileContent.POSITION_DELETES);
             :               positionDeleteFiles_.add(delFileDesc.first);
             :             }
             :           }
             :         }
             :       }
> nit: for readability, this could go to its own method.
Done


http://gerrit.cloudera.org:8080/#/c/20753/8/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/20753/8/testdata/data/README@776
PS8, Line 776: when running a DELETE FROM statement
> Could we also hack INSERT INTO? In that case Impala would just write the Pa
I think that is also a valid way to hack these eq-deletes. Next time I'll try 
it this way :)


http://gerrit.cloudera.org:8080/#/c/20753/8/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/20753/8/tests/query_test/test_iceberg.py@1262
PS8, Line 1262: e
> nit: we usually don't add space here
Done


http://gerrit.cloudera.org:8080/#/c/20753/8/tests/query_test/test_iceberg.py@1264
PS8, Line 1264:
              :   @SkipIfDockerizedCluster.internal_hostname
              :   @SkipIf.hardcoded_uris
              :   def test_multiple_equality_ids(self, unique_database):
              :     """This test loads an Iceberg table that has 2 equality 
delete files with different
              :         equality ID lists. A query on such a table fails due to 
lack of support."""
              :     SRC_DIR = os.path.join(os.environ['IMPALA_HOME'],
              :         "testdata/data/iceberg_test/hadoop_catalog/ice/"
              :         "iceberg_v2_delete_different_equality_ids")
              :     DST_DIR = 
"/test-warehouse/iceberg_test/hadoop_catalog/ice/" \
              :         "iceberg_v2_delete_different_equality_ids"
> Can we use 'file_utils.create_iceberg_table_from_directory'? With equality
Good Idea. I gave this a try, however I realized that for this particular table 
it's not possible to actually load it as there are different equality fields ID 
lists, so even an ALTER TABLE after the CREATE TABLE would give an error. So in 
practice for hadoop catalog I had to add a number of 'is hadoop catalog' kind 
of checks inot the function plus I had to also make an option to not run the 
last ALTER TABLE statement in the function. I found this too much complexity 
being added for this particular table. In general for adding tables with this 
function into hadoop catalog would be great but it wouldn;t work in this case.


http://gerrit.cloudera.org:8080/#/c/20753/8/tests/query_test/test_iceberg.py@1278
PS8, Line 1278:
> Is there a reason we create an external table?
If I create it as a non-external table I get an error saying that the table 
already exists. Anyway, create_iceberg_table_from_directory() also creates an 
external table


http://gerrit.cloudera.org:8080/#/c/20753/8/tests/query_test/test_iceberg.py@1290
PS8, Line 1290:
              :       err = self.execute_query_e
> What are these lists?
These are the mismatching equality ID lists. Added a comment for more clarity.


http://gerrit.cloudera.org:8080/#/c/20753/10/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/20753/10/tests/query_test/test_iceberg.py@1267
PS10, Line 1267:   def test_multiple_equality_ids(self, unique_database):
Added the test table to unique_database so that I can save on level of 
try-catch block



--
To view, visit http://gerrit.cloudera.org:8080/20753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad
Gerrit-Change-Number: 20753
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Andrew Sherman <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tamas Mate <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Mon, 18 Dec 2023 15:07:58 +0000
Gerrit-HasComments: Yes

Reply via email to