Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23436 )

Change subject: IMPALA-14447: Parallelize table loading in getMissingTables()
......................................................................


Patch Set 1:

(4 comments)

Thanks for working on this! I see there are some difference in query profile 
inside the CatalogFetch counters. Haven't looked into the cause yet. But we 
might need some tests to verify table loading is triggered in parallel.

http://gerrit.cloudera.org:8080/#/c/23436/1/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java
File fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/23436/1/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@331
PS1, Line 331:     List<FeTable> tables = tbls.parallelStream()
Tried this but it's not actually parallelized. There is a related discussion:
https://stackoverflow.com/questions/28985704/parallel-stream-from-a-hashset-doesnt-run-in-parallel

Let's create an ArrayList on 'tbl' to parallelStream().


http://gerrit.cloudera.org:8080/#/c/23436/1/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@336
PS1, Line 336:                                  FeDb db = 
catalog.getDb(tblName.getDb());
It's stated that LocalCatalog and any of the related catalog object 
implementations are not thread-safe:
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java#L74-L75

We might need to improve some methods to make them thread-safe.


http://gerrit.cloudera.org:8080/#/c/23436/1/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@339
PS1, Line 339:                                  return 
db.getTable(tblName.getTbl());
Logs of the other threads are missing the query id.


http://gerrit.cloudera.org:8080/#/c/23436/1/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@342
PS1, Line 342:                                .collect(Collectors.toList());
nit: Can we reduce some idention? E.g.

    List<FeTable> tables =
        tbls.parallelStream()
            .map(tblName -> {
              if (loadedOrFailedTbls_.containsKey(tblName)) {
                return null;
              }
              FeDb db = catalog.getDb(tblName.getDb());
              if (db == null) return null;
              dbs_.add(tblName.getDb());
              return db.getTable(tblName.getTbl());
            })
            .filter(Objects::nonNull)
            .collect(Collectors.toList());



--
To view, visit http://gerrit.cloudera.org:8080/23436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I97a5165844ae846b28338d62e93a20121488d79f
Gerrit-Change-Number: 23436
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Steve Carlin <[email protected]>
Gerrit-Comment-Date: Thu, 18 Sep 2025 07:54:19 +0000
Gerrit-HasComments: Yes

Reply via email to