Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/23436 )
Change subject: IMPALA-14447: Parallelize table loading in getMissingTables() ...................................................................... Patch Set 3: (2 comments) > Thanks for working on this! I see there are some difference in query profile > inside the CatalogFetch counters. Haven't looked into the cause yet. But we > might need some tests to verify table loading is triggered in parallel. Left a comment about the cause of the profile issue. Maybe some existing tests can also identify it. http://gerrit.cloudera.org:8080/#/c/23436/1/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java File fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/23436/1/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@339 PS1, Line 339: tblName.toString(); > Patch set 3 adds ThreadNameAnnotator. Thanks for digging into this. We can work on adding query ids in IMPALA-12870. For the thread annotation, we already set one in CatalogdMetaProvider.loadWithCaching(), e.g. "ForkJoinPool.commonPool-worker-2 [LoadWithCaching for table metadata for default.part_900k_parq1]" #57 daemon prio=5 os_prio=0 cpu=74.82ms elapsed=211.72s tid=0x0000000006d25080 nid=0xa97c runnable [0x00007f05f97fc000] java.lang.Thread.State: RUNNABLE at org.apache.impala.service.FeSupport.NativeGetPartialCatalogObject(Native Method) at org.apache.impala.service.FeSupport.GetPartialCatalogObject(FeSupport.java:442) at org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:434) at org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:200) at org.apache.impala.catalog.local.CatalogdMetaProvider$4.call(CatalogdMetaProvider.java:777) at org.apache.impala.catalog.local.CatalogdMetaProvider$4.call(CatalogdMetaProvider.java:769) at org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:571) at org.apache.impala.catalog.local.CatalogdMetaProvider.loadTable(CatalogdMetaProvider.java:765) at org.apache.impala.catalog.local.LocalTable.loadTableMetadata(LocalTable.java:153) at org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:105) at org.apache.impala.catalog.local.LocalDb.getTable(LocalDb.java:145) at org.apache.impala.analysis.StmtMetadataLoader.lambda$getMissingTables$1(StmtMetadataLoader.java:333) at org.apache.impala.analysis.StmtMetadataLoader$$Lambda$242/0x00000008012b3468.apply(Unknown Source) at java.util.stream.ReferencePipeline$3$1.accept([email protected]/ReferencePipeline.java:197) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining([email protected]/ArrayList.java:1625) at java.util.stream.AbstractPipeline.copyInto([email protected]/AbstractPipeline.java:509) at java.util.stream.AbstractPipeline.wrapAndCopyInto([email protected]/AbstractPipeline.java:499) at java.util.stream.ReduceOps$ReduceTask.doLeaf([email protected]/ReduceOps.java:960) at java.util.stream.ReduceOps$ReduceTask.doLeaf([email protected]/ReduceOps.java:934) at java.util.stream.AbstractTask.compute([email protected]/AbstractTask.java:327) at java.util.concurrent.CountedCompleter.exec([email protected]/CountedCompleter.java:754) at java.util.concurrent.ForkJoinTask.doExec([email protected]/ForkJoinTask.java:373) at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec([email protected]/ForkJoinPool.java:1182) at java.util.concurrent.ForkJoinPool.scan([email protected]/ForkJoinPool.java:1655) at java.util.concurrent.ForkJoinPool.runWorker([email protected]/ForkJoinPool.java:1622) at java.util.concurrent.ForkJoinWorkerThread.run([email protected]/ForkJoinWorkerThread.java:165) But it's fine to have one more annotation here. http://gerrit.cloudera.org:8080/#/c/23436/3/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/23436/3/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@653 PS3, Line 653: FrontendProfile profile = FrontendProfile.getCurrentOrNull(); Only the initial thread has this thread-local profile that is constructed here: https://github.com/apache/impala/blob/0e3079202380f0ed766659311166fe8728781d30/fe/src/main/java/org/apache/impala/service/Frontend.java#L2091 For other threads launched by parallelStream(), we need them to update this profile as well. -- To view, visit http://gerrit.cloudera.org:8080/23436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I97a5165844ae846b28338d62e93a20121488d79f Gerrit-Change-Number: 23436 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Steve Carlin <[email protected]> Gerrit-Comment-Date: Fri, 19 Sep 2025 15:49:00 +0000 Gerrit-HasComments: Yes
