Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/23436 )
Change subject: IMPALA-14447: Parallelize table loading in getMissingTables() ...................................................................... Patch Set 1: (4 comments) Thanks for working on this! I see there are some difference in query profile inside the CatalogFetch counters. Haven't looked into the cause yet. But we might need some tests to verify table loading is triggered in parallel. http://gerrit.cloudera.org:8080/#/c/23436/1/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java File fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/23436/1/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@331 PS1, Line 331: List<FeTable> tables = tbls.parallelStream() Tried this but it's not actually parallelized. There is a related discussion: https://stackoverflow.com/questions/28985704/parallel-stream-from-a-hashset-doesnt-run-in-parallel Let's create an ArrayList on 'tbl' to parallelStream(). http://gerrit.cloudera.org:8080/#/c/23436/1/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@336 PS1, Line 336: FeDb db = catalog.getDb(tblName.getDb()); It's stated that LocalCatalog and any of the related catalog object implementations are not thread-safe: https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java#L74-L75 We might need to improve some methods to make them thread-safe. http://gerrit.cloudera.org:8080/#/c/23436/1/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@339 PS1, Line 339: return db.getTable(tblName.getTbl()); Logs of the other threads are missing the query id. http://gerrit.cloudera.org:8080/#/c/23436/1/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@342 PS1, Line 342: .collect(Collectors.toList()); nit: Can we reduce some idention? E.g. List<FeTable> tables = tbls.parallelStream() .map(tblName -> { if (loadedOrFailedTbls_.containsKey(tblName)) { return null; } FeDb db = catalog.getDb(tblName.getDb()); if (db == null) return null; dbs_.add(tblName.getDb()); return db.getTable(tblName.getTbl()); }) .filter(Objects::nonNull) .collect(Collectors.toList()); -- To view, visit http://gerrit.cloudera.org:8080/23436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I97a5165844ae846b28338d62e93a20121488d79f Gerrit-Change-Number: 23436 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Steve Carlin <[email protected]> Gerrit-Comment-Date: Thu, 18 Sep 2025 07:54:19 +0000 Gerrit-HasComments: Yes
