Noemi Pap-Takacs has posted comments on this change. ( http://gerrit.cloudera.org:8080/22014 )
Change subject: IMPALA-13154: Update stats when loading an HDFS table ...................................................................... Patch Set 12: (2 comments) Thanks for working on this! http://gerrit.cloudera.org:8080/#/c/22014/1/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java File fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java: http://gerrit.cloudera.org:8080/#/c/22014/1/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java@373 PS1, Line 373: if (part.getWriteId() >= 0) : thriftHdfsPart.setWrite_id(part.getWriteId()); : if (type == ThriftObjectType.FULL) { : thriftHdfsPart.setPartition_name(part.getPartitionName()); : thriftHdfsPart.setStats(new TTableStats(part.getNumRows())); : > Note that when the table is a Hive ACID table or Iceberg V2 table, > we use insertFileDescriptors and deleteFileDescriptors , and keep > fileDescriptors as empty. For other kinds of HDFS table, we use > fileDescriptors and keep insertFileDescriptors and > deleteFileDescriptors as empty. This is not true to Iceberg tables. In V2 tables we count both data and delete files simply as fileDescriptors, and do not put them into insertFileDescriptors and deleteFileDescriptors. We keep track of the file type in GroupedContentFiles. http://gerrit.cloudera.org:8080/#/c/22014/12/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/22014/12/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@827 PS12, Line 827: // TODO(todd): would be good to log a summary of the loading process: Just an idea: what about collecting the table/partition loading stats from the loaders' LoadStats objects (available in ParallelFileMetadataLoader), and summarizing here into FileMetadataStats. Currently these 2 classes are not connected and we iterate through the file descriptors twice (once in the FileMetadataLoaders during loading and once in HdfsTable) just to get simple stats like number of files. We can also log them. See IMPALA-13122 -- To view, visit http://gerrit.cloudera.org:8080/22014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6e2eb503b0f61b1e6403058bc5dc78d721e7e940 Gerrit-Change-Number: 22014 Gerrit-PatchSet: 12 Gerrit-Owner: Xuebin Su <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Xuebin Su <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 15 Nov 2024 14:37:45 +0000 Gerrit-HasComments: Yes
