Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20759 )

Change subject: IMPALA-12205: Add support to STRUCT type Iceberg Metadata table 
columns
......................................................................


Patch Set 4:

(11 comments)

Thanks for the work, Tamas!

http://gerrit.cloudera.org:8080/#/c/20759/4/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h
File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/20759/4/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@131
PS4, Line 131: Collects the field ids of the struct members.
nit: this part of the comment might be misleading because this function not 
just collects the field IDs but also creates the accessors.


http://gerrit.cloudera.org:8080/#/c/20759/4/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@135
PS4, Line 135: Collect
nit: The above function name starts with "Create" while this one with "Collect" 
but actually serves the same purpose. Could you please unify the names?


http://gerrit.cloudera.org:8080/#/c/20759/4/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc
File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/20759/4/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@97
PS4, Line 97:     } else if (slot_desc->col_path().size() > 1) {
Could you add a dcheck that slot_desc is primitive?


http://gerrit.cloudera.org:8080/#/c/20759/4/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@101
PS4, Line 101: ColumnType current_type =
             :           
tuple_desc_->table_desc()->col_descs()[root_type_index].type();
             :       for (int i = 1; i < slot_desc->col_path().size() - 1; ++i) 
{
             :         current_type = 
current_type.children[slot_desc->col_path()[i]];
             :       }
I'm not sure I get the point of this for loop. Is it deliberate that you get 
the second type for the second layer, third part for the third layer etc.?
It would hep a lot to have a comment here how this ColumnType structure looks 
like in this case.


http://gerrit.cloudera.org:8080/#/c/20759/4/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@119
PS4, Line 119: {
nit: no need for the braces here.


http://gerrit.cloudera.org:8080/#/c/20759/4/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@120
PS4, Line 120: std::vector<int>
This is a copy. You can make this a const ref I think.


http://gerrit.cloudera.org:8080/#/c/20759/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
File be/src/exec/iceberg-metadata/iceberg-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/20759/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@77
PS4, Line 77:     if (!slot_desc->type().IsStructType()) {
Structs can also be nulls in theory. I'm not sure about the struct values in 
the metadata table, though, just wanted to raise attention that we might want 
to run the code in this block also for structs.


http://gerrit.cloudera.org:8080/#/c/20759/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@181
PS4, Line 181:   RETURN_IF_ERROR(MaterializeTuple(env, struct_like_row,
DCHECK that slot_desc is struct (and not null)?


http://gerrit.cloudera.org:8080/#/c/20759/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/20759/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@241
PS4, Line 241:       throw new AnalysisException(
Is there a test to cover this?


http://gerrit.cloudera.org:8080/#/c/20759/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/20759/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@458
PS4, Line 458: ####
Could you have another test where you join 2 different metadata tables, and 
query a struct from each one? Also some order by and aggregation would be nice 
on them.


http://gerrit.cloudera.org:8080/#/c/20759/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@459
PS4, Line 459: # Test 9 : Query STRUCT type columns
Do I get it right that for a SELECT * query we still won't have the struct 
columns included into the result? I think that'd make this struct support very 
limited because the users would have to know the metadata table columns names 
by heart or they'd have to keep looking up the names from the Iceberg spec.
What do you think about unconditionally include the struct cols for a metadata 
table query? I think they serve an essential part of such a query so would make 
sense in my opinion.
I read you other comment about the lack of BINARY type support, however, I 
think binary data could be read as STRING so one option is to show them as 
strings. Another option is to omit the binary data or set them to null.



-- 
To view, visit http://gerrit.cloudera.org:8080/20759
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I953ad7253b270f2855bfcaee4ad023d1c4469273
Gerrit-Change-Number: 20759
Gerrit-PatchSet: 4
Gerrit-Owner: Tamas Mate <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tamas Mate <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Mon, 11 Dec 2023 15:34:36 +0000
Gerrit-HasComments: Yes

Reply via email to