Xuebin Su has posted comments on this change. ( http://gerrit.cloudera.org:8080/22014 )
Change subject: IMPALA-13154: Update stats when loading an HDFS table ...................................................................... Patch Set 4: (5 comments) > Patch Set 1: > > (5 comments) Thanks for reviewing! http://gerrit.cloudera.org:8080/#/c/22014/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22014/1//COMMIT_MSG@7 PS1, Line 7: load > nit: "loading" Thanks! Changed. http://gerrit.cloudera.org:8080/#/c/22014/1/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java File fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java: http://gerrit.cloudera.org:8080/#/c/22014/1/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java@369 PS1, Line 369: criptor(sd.toThrif > This would be an expensive operation for large tables, e.g. with 5M files. Thanks! Changed. http://gerrit.cloudera.org:8080/#/c/22014/1/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java@373 PS1, Line 373: if (part.getWriteId() >= 0) : thriftHdfsPart.setWrite_id(part.getWriteId()); : if (type == ThriftObjectType.FULL) { : thriftHdfsPart.setPartition_name(part.getPartitionName()); : thriftHdfsPart.setStats(new TTableStats(part.getNumRows())); : > I think we can simply set 'numFiles' to HdfsPartition#getNumFileDescriptors Thanks! Changed. http://gerrit.cloudera.org:8080/#/c/22014/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/22014/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1240 PS1, Line 1240: partitionStats.numBlocks >= 0 && partitionStats.totalFileBytes >= 0); : newStats.merge(partitionStats); > 'fileMetadataStats_' is still the old stats before we update it in the next Thanks! Moved this line down to preserve the order of the old code. http://gerrit.cloudera.org:8080/#/c/22014/1/tests/webserver/test_web_pages.py File tests/webserver/test_web_pages.py: http://gerrit.cloudera.org:8080/#/c/22014/1/tests/webserver/test_web_pages.py@535 PS1, Line 535: self.client.set_configuration(query_options) : query_handle = self.client.execute_async(query) > These might fail since we only show the top-25 tables: Thanks! Changed. -- To view, visit http://gerrit.cloudera.org:8080/22014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6e2eb503b0f61b1e6403058bc5dc78d721e7e940 Gerrit-Change-Number: 22014 Gerrit-PatchSet: 4 Gerrit-Owner: Xuebin Su <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Xuebin Su <[email protected]> Gerrit-Comment-Date: Mon, 04 Nov 2024 09:22:07 +0000 Gerrit-HasComments: Yes
