Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23436 )

Change subject: IMPALA-14447: Parallelize table loading in getMissingTables()
......................................................................


Patch Set 3:

(2 comments)

> Thanks for working on this! I see there are some difference in query profile 
> inside the CatalogFetch counters. Haven't looked into the cause yet. But we 
> might need some tests to verify table loading is triggered in parallel.

Left a comment about the cause of the profile issue. Maybe some existing tests 
can also identify it.

http://gerrit.cloudera.org:8080/#/c/23436/1/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java
File fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/23436/1/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@339
PS1, Line 339:               tblName.toString();
> Patch set 3 adds ThreadNameAnnotator.
Thanks for digging into this. We can work on adding query ids in IMPALA-12870.

For the thread annotation, we already set one in 
CatalogdMetaProvider.loadWithCaching(), e.g.

"ForkJoinPool.commonPool-worker-2 [LoadWithCaching for table metadata for 
default.part_900k_parq1]" #57 daemon prio=5 os_prio=0 cpu=74.82ms 
elapsed=211.72s tid=0x0000000006d25080 nid=0xa97c runnable  [0x00007f05f97fc000]
   java.lang.Thread.State: RUNNABLE
        at 
org.apache.impala.service.FeSupport.NativeGetPartialCatalogObject(Native Method)
        at 
org.apache.impala.service.FeSupport.GetPartialCatalogObject(FeSupport.java:442)
        at 
org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:434)
        at 
org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:200)
        at 
org.apache.impala.catalog.local.CatalogdMetaProvider$4.call(CatalogdMetaProvider.java:777)
        at 
org.apache.impala.catalog.local.CatalogdMetaProvider$4.call(CatalogdMetaProvider.java:769)
        at 
org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:571)
        at 
org.apache.impala.catalog.local.CatalogdMetaProvider.loadTable(CatalogdMetaProvider.java:765)
        at 
org.apache.impala.catalog.local.LocalTable.loadTableMetadata(LocalTable.java:153)
        at org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:105)
        at org.apache.impala.catalog.local.LocalDb.getTable(LocalDb.java:145)
        at 
org.apache.impala.analysis.StmtMetadataLoader.lambda$getMissingTables$1(StmtMetadataLoader.java:333)
        at 
org.apache.impala.analysis.StmtMetadataLoader$$Lambda$242/0x00000008012b3468.apply(Unknown
 Source)
        at 
java.util.stream.ReferencePipeline$3$1.accept([email protected]/ReferencePipeline.java:197)
        at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining([email protected]/ArrayList.java:1625)
        at 
java.util.stream.AbstractPipeline.copyInto([email protected]/AbstractPipeline.java:509)
        at 
java.util.stream.AbstractPipeline.wrapAndCopyInto([email protected]/AbstractPipeline.java:499)
        at 
java.util.stream.ReduceOps$ReduceTask.doLeaf([email protected]/ReduceOps.java:960)
        at 
java.util.stream.ReduceOps$ReduceTask.doLeaf([email protected]/ReduceOps.java:934)
        at 
java.util.stream.AbstractTask.compute([email protected]/AbstractTask.java:327)
        at 
java.util.concurrent.CountedCompleter.exec([email protected]/CountedCompleter.java:754)
        at 
java.util.concurrent.ForkJoinTask.doExec([email protected]/ForkJoinTask.java:373)
        at 
java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec([email protected]/ForkJoinPool.java:1182)
        at 
java.util.concurrent.ForkJoinPool.scan([email protected]/ForkJoinPool.java:1655)
        at 
java.util.concurrent.ForkJoinPool.runWorker([email protected]/ForkJoinPool.java:1622)
        at 
java.util.concurrent.ForkJoinWorkerThread.run([email protected]/ForkJoinWorkerThread.java:165)

But it's fine to have one more annotation here.


http://gerrit.cloudera.org:8080/#/c/23436/3/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/23436/3/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@653
PS3, Line 653:     FrontendProfile profile = FrontendProfile.getCurrentOrNull();
Only the initial thread has this thread-local profile that is constructed here:
https://github.com/apache/impala/blob/0e3079202380f0ed766659311166fe8728781d30/fe/src/main/java/org/apache/impala/service/Frontend.java#L2091

For other threads launched by parallelStream(), we need them to update this 
profile as well.



--
To view, visit http://gerrit.cloudera.org:8080/23436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I97a5165844ae846b28338d62e93a20121488d79f
Gerrit-Change-Number: 23436
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Steve Carlin <[email protected]>
Gerrit-Comment-Date: Fri, 19 Sep 2025 15:49:00 +0000
Gerrit-HasComments: Yes

Reply via email to