Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20759 )
Change subject: IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns ...................................................................... Patch Set 4: (11 comments) Thanks for the work, Tamas! http://gerrit.cloudera.org:8080/#/c/20759/4/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h: http://gerrit.cloudera.org:8080/#/c/20759/4/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@131 PS4, Line 131: Collects the field ids of the struct members. nit: this part of the comment might be misleading because this function not just collects the field IDs but also creates the accessors. http://gerrit.cloudera.org:8080/#/c/20759/4/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@135 PS4, Line 135: Collect nit: The above function name starts with "Create" while this one with "Collect" but actually serves the same purpose. Could you please unify the names? http://gerrit.cloudera.org:8080/#/c/20759/4/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/20759/4/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@97 PS4, Line 97: } else if (slot_desc->col_path().size() > 1) { Could you add a dcheck that slot_desc is primitive? http://gerrit.cloudera.org:8080/#/c/20759/4/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@101 PS4, Line 101: ColumnType current_type = : tuple_desc_->table_desc()->col_descs()[root_type_index].type(); : for (int i = 1; i < slot_desc->col_path().size() - 1; ++i) { : current_type = current_type.children[slot_desc->col_path()[i]]; : } I'm not sure I get the point of this for loop. Is it deliberate that you get the second type for the second layer, third part for the third layer etc.? It would hep a lot to have a comment here how this ColumnType structure looks like in this case. http://gerrit.cloudera.org:8080/#/c/20759/4/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@119 PS4, Line 119: { nit: no need for the braces here. http://gerrit.cloudera.org:8080/#/c/20759/4/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@120 PS4, Line 120: std::vector<int> This is a copy. You can make this a const ref I think. http://gerrit.cloudera.org:8080/#/c/20759/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/20759/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@77 PS4, Line 77: if (!slot_desc->type().IsStructType()) { Structs can also be nulls in theory. I'm not sure about the struct values in the metadata table, though, just wanted to raise attention that we might want to run the code in this block also for structs. http://gerrit.cloudera.org:8080/#/c/20759/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@181 PS4, Line 181: RETURN_IF_ERROR(MaterializeTuple(env, struct_like_row, DCHECK that slot_desc is struct (and not null)? http://gerrit.cloudera.org:8080/#/c/20759/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java File fe/src/main/java/org/apache/impala/analysis/SlotRef.java: http://gerrit.cloudera.org:8080/#/c/20759/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@241 PS4, Line 241: throw new AnalysisException( Is there a test to cover this? http://gerrit.cloudera.org:8080/#/c/20759/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/20759/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@458 PS4, Line 458: #### Could you have another test where you join 2 different metadata tables, and query a struct from each one? Also some order by and aggregation would be nice on them. http://gerrit.cloudera.org:8080/#/c/20759/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@459 PS4, Line 459: # Test 9 : Query STRUCT type columns Do I get it right that for a SELECT * query we still won't have the struct columns included into the result? I think that'd make this struct support very limited because the users would have to know the metadata table columns names by heart or they'd have to keep looking up the names from the Iceberg spec. What do you think about unconditionally include the struct cols for a metadata table query? I think they serve an essential part of such a query so would make sense in my opinion. I read you other comment about the lack of BINARY type support, however, I think binary data could be read as STRING so one option is to show them as strings. Another option is to omit the binary data or set them to null. -- To view, visit http://gerrit.cloudera.org:8080/20759 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I953ad7253b270f2855bfcaee4ad023d1c4469273 Gerrit-Change-Number: 20759 Gerrit-PatchSet: 4 Gerrit-Owner: Tamas Mate <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tamas Mate <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Mon, 11 Dec 2023 15:34:36 +0000 Gerrit-HasComments: Yes
