[
https://issues.apache.org/jira/browse/IMPALA-14074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang reassigned IMPALA-14074:
---------------------------------------
Assignee: Quanlong Huang
Description:
Users might want to warm up the metadata cache for some important tables after
starting catalogd, which could take hours in a large warehouse. When Catalogd
HA is enabled and a failover happens, the standby catalogd becomes active with
a cold cache. This requires users to warm up the cache again. During this
period, queries suffer from poor performance and might fail in timeout.
We can provide a configuration file loaded in catalogd startup that contains
one table at a line. E.g. --preload_metadata_table_list_file=table_list.txt.
Catalogd triggers metadata loading of those tables in the background. Then
users don't need to explictly run some queries to warm up the cache. Since the
tables are loaded, HMS notification events on them won't be skipped in the
standby catalogd. In this way, the standby catalogd can keep the cache
up-to-date.
These tables will be added into the table loading queue, just like how
PrioritizeLoadRequest triggered by queries does. So the concurrency is still
controlled by num_metadata_loading_threads.
Catalogd should exponse metrics to indicate whether the loading of these tables
is done. E.g. num-preload-metadata-tasks for all valid table names in the list,
and num-preload-metadata-tasks-done for loaded tables. When these two metrics
are equal, the warmup is done.
Metadata warmup should also happens after global INVALIDATE METADATA, similar
to startup.
was:
Users might want to warm up the metadata cache for some important tables after
restarting catalogd. We can provide a configuration file loaded in catalogd
startup that contains one table at a line. E.g.
--preload_metadata_table_list_file=table_list.txt. Catalogd triggers metadata
loading of those tables in the background. Then users don't need to explictly
run some queries to warm up the cache.
These tables will be added into the table loading queue, just like how
PrioritizeLoadRequest triggered by queries does. So the concurrency is still
controlled by num_metadata_loading_threads.
Catalogd should exponse metrics to indicate whether the loading of these tables
is done. E.g. num-preload-metadata-tasks for all valid table names in the list,
and num-preload-metadata-tasks-done for loaded tables. When these two metrics
are equal, the warmup is done.
Metadata warmup also happens after global INVALIDATE METADATA, similar to
startup.
Issue Type: Bug (was: New Feature)
Summary: Standby catalogd should warmup the metadata cache (was:
Support metadata cache warmup of some tables in catalogd startup)
> Standby catalogd should warmup the metadata cache
> -------------------------------------------------
>
> Key: IMPALA-14074
> URL: https://issues.apache.org/jira/browse/IMPALA-14074
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
>
> Users might want to warm up the metadata cache for some important tables
> after starting catalogd, which could take hours in a large warehouse. When
> Catalogd HA is enabled and a failover happens, the standby catalogd becomes
> active with a cold cache. This requires users to warm up the cache again.
> During this period, queries suffer from poor performance and might fail in
> timeout.
> We can provide a configuration file loaded in catalogd startup that contains
> one table at a line. E.g. --preload_metadata_table_list_file=table_list.txt.
> Catalogd triggers metadata loading of those tables in the background. Then
> users don't need to explictly run some queries to warm up the cache. Since
> the tables are loaded, HMS notification events on them won't be skipped in
> the standby catalogd. In this way, the standby catalogd can keep the cache
> up-to-date.
> These tables will be added into the table loading queue, just like how
> PrioritizeLoadRequest triggered by queries does. So the concurrency is still
> controlled by num_metadata_loading_threads.
> Catalogd should exponse metrics to indicate whether the loading of these
> tables is done. E.g. num-preload-metadata-tasks for all valid table names in
> the list, and num-preload-metadata-tasks-done for loaded tables. When these
> two metrics are equal, the warmup is done.
> Metadata warmup should also happens after global INVALIDATE METADATA, similar
> to startup.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]