Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22816 )

Change subject: IMPALA-13829: Postpone catalog deleteLog GC for waitForHmsEvent 
requests
......................................................................


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/22816/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22816/2//COMMIT_MSG@33
PS2, Line 33: A new flag, catalog_delete_log_gc_frequency, is added for this. 
The
            : deleteLog GC happens in every N+1 
(N=catalog_delete_log_gc_frequency)
            : topic updates.
That's a fair concern. The majority of the memory footprint of TopicUpdateLog 
and DeleteLog are table names and partition names. But one difference is 
TopicUpdateLog deduplicate them by using the names (i.e. catalogObjectKey) as 
the map key. In DeleteLog, the map key is the catalog version (in Long type), 
if a table is updated frequently, its removed objects could pile up there. We 
might need to tune catalog_delete_log_gc_frequency separately. In my local 
tests, even running with catalog_delete_log_gc_frequency=1 makes the failed 
test succeed stably. Maybe 1000 is too high. Need some stress tests to decide a 
good default.

On the other side, table names and partition names are generally not the main 
components of catalogd memory usage. The primary portions are usually file 
metadata and incremental stats. Maybe the increased memory usage on deleteLog 
is trivial to the total heap size.

> However, would it be better if GC happen every topic update, but only GC log 
> from 1 hour behind to avoid OOM?

I think this can GC the removed objects faster. But might have the same effect 
as reducing catalog_delete_log_gc_frequency.


http://gerrit.cloudera.org:8080/#/c/22816/4/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/22816/4/be/src/catalog/catalog-server.cc@243
PS4, Line 243: DECLARE_string(state_store_2_host);
> This is a duplicate line.
Oops, don't know how I added this..


http://gerrit.cloudera.org:8080/#/c/22816/4/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java:

http://gerrit.cloudera.org:8080/#/c/22816/4/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java@72
PS4, Line 72: oldest
> Isn't it actually the youngest topic update that is GC'ed, i.e. all topic u
I just keep the name used in TopicUpdateLog:
https://github.com/apache/impala/blob/5e39afc2d90a606ce05de376af5cd54035f7f99a/fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java#L56

I think it means the oldest topic update that we can use in the GC. In other 
words, the oldest catalog version that we can use in the next call of 
garbageCollectInternal().


http://gerrit.cloudera.org:8080/#/c/22816/4/tests/metadata/test_event_processing.py
File tests/metadata/test_event_processing.py:

http://gerrit.cloudera.org:8080/#/c/22816/4/tests/metadata/test_event_processing.py@627
PS4, Line 627: < latest catalog version
> If I understand it correctly, this assumes the behaviour before this patch.
Done


http://gerrit.cloudera.org:8080/#/c/22816/4/tests/metadata/test_event_processing.py@633
PS4, Line 633: t
> Optional: in my opinion, if we're not using the resulting lists, for loops
Done



--
To view, visit http://gerrit.cloudera.org:8080/22816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2441440bca2b928205dd514047ba742a5e8bf05e
Gerrit-Change-Number: 22816
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Sat, 26 Apr 2025 03:07:54 +0000
Gerrit-HasComments: Yes

Reply via email to