[ 
https://issues.apache.org/jira/browse/IMPALA-13178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-13178:
---------------------------------------

    Assignee:     (was: Quanlong Huang)

> Flush the metadata cache to remote storage instead of just invalidating them 
> in full GCs
> ----------------------------------------------------------------------------------------
>
>                 Key: IMPALA-13178
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13178
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Priority: Critical
>              Labels: catalog-2024
>
> When invalidate_tables_on_memory_pressure is enabled, catalogd will 
> invalidate 10% (configured by invalidate_tables_fraction_on_memory_pressure) 
> of the tables if the old gen usage of JVM still exceeds 60% (configured by 
> invalidate_tables_gc_old_gen_full_threshold) after a full GC.
> Later if the table is used again, catalogd will try to load its metadata. The 
> loading process could also lead to OOM (see IMPALA-13117).
> On the other hand, the metadata might have no changes so it's a waste to 
> evict and reload them again. Fetching all the partitions from HMS and file 
> listing on the storage are expensive. It'd be better to flush out the 
> metadata cache of a table instead of just invalidating it. If there are no 
> more invalidates (either implicit ones from HMS event processing or explicit 
> ones from user commands) on the table, we can reuse the flushed metadata.
> They can be flushed to the remote storage (e.g. HDFS/Ozone/S3) so catalogd 
> has unlimited space to use. We can consider just flushing out the 
> encodedFileDescriptors (the file metadata) and incremental stats which are 
> usually the majority of the metadata cache. Or use a well-defined format 
> (e.g. Iceberg manifest files) so we can incrementally flush the metadata even 
> with catalog changes (DDL/DMLs).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to