[ 
https://issues.apache.org/jira/browse/IMPALA-14447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18029578#comment-18029578
 ] 

ASF subversion and git services commented on IMPALA-14447:
----------------------------------------------------------

Commit 1008decc0780fc4da9a3d35cafc5c93f9f3574e5 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1008decc0 ]

IMPALA-14447: Parallelize table loading in getMissingTables()

StmtMetadataLoader.getMissingTables() load missing tables in serial
manner. In local catalog mode, large number of serial table loading can
incur significant round trip latency to CatalogD. This patch parallelize
the table loading by using executor service to lookup and gather all
non-null FeTables from given TableName set.

Modify LocalCatalog.loadDbs() and LocalDb.loadTableNames() slightly to
make it thread-safe. Change FrontendProfile.Scope to support nested
scope referencing the same FrontendProfile instance.

Added new flag max_stmt_metadata_loader_threads to control the maximum
number of threads to use for loading table metadata during query
compilation. It is deafult to 8 threads per query compilation.

If there is only one table to load, max_stmt_metadata_loader_threads set
to 1, or RejectedExecutionException raised, fallback to load table
serially.

Testing:
Run and pass few tests such as test_catalogd_ha.py,
test_concurrent_ddls.py, and test_observability.py.
Add FE tests CatalogdMetaProviderTest.testProfileParallelLoad.
Manually run following query and observe parallel loading by setting
TRACE level log in CatalogdMetaProvider.java.

use functional;
select count(*) from alltypesnopart
union select count(*) from alltypessmall
union select count(*) from alltypestiny
union select count(*) from alltypesagg;

Change-Id: I97a5165844ae846b28338d62e93a20121488d79f
Reviewed-on: http://gerrit.cloudera.org:8080/23436
Reviewed-by: Quanlong Huang <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Metadata loading is not triggered in parallel in local catalog mode
> -------------------------------------------------------------------
>
>                 Key: IMPALA-14447
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14447
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Quanlong Huang
>            Assignee: Riza Suminto
>            Priority: Major
>
> When a query accesses multiple tables that are unloaded, metadata loading of 
> them is triggered sequentially in local catalog mode. The stacktrace of 
> coordinator thread:
> {noformat}
> "Thread-20 [LoadWithCaching for table metadata for default.part_900k_parq1]" 
> #112 prio=5 os_prio=0 tid=0x000000000aa08000 nid=0x27f7 runnable 
> [0x00007fcb9afe5000]
>    java.lang.Thread.State: RUNNABLE
>         at 
> org.apache.impala.service.FeSupport.NativeGetPartialCatalogObject(Native 
> Method)
>         at 
> org.apache.impala.service.FeSupport.GetPartialCatalogObject(FeSupport.java:472)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:463)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:209)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider$4.call(CatalogdMetaProvider.java:815)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider$4.call(CatalogdMetaProvider.java:807)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:601)
>         at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadTable(CatalogdMetaProvider.java:803)
>         at 
> org.apache.impala.catalog.local.LocalTable.loadTableMetadata(LocalTable.java:164)
>         at 
> org.apache.impala.catalog.local.LocalTable.load(LocalTable.java:114)
>         at org.apache.impala.catalog.local.LocalDb.getTable(LocalDb.java:148)
>         at 
> org.apache.impala.analysis.StmtMetadataLoader.getMissingTables(StmtMetadataLoader.java:323)
>         at 
> org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:176)
>         at 
> org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:145)
>         at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2600)
>         at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2295)
>         at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:2032)
>         at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:171){noformat}
> For an unloaded table, metadata loading is triggered by the first call on 
> LocalDb.getTable(). The metadata loading of the second unloaded table is 
> triggered after this is done.
> This is a performance regression comparing to the legacy catalog mode where 
> metadata loading on all tables accessed by a query are triggered in parallel.
> CC [~rizaon]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to