Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23804 )

Change subject: IMPALA-13863: Add metric to track number of loaded tables in 
catalogd
......................................................................


Patch Set 1:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/23804/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/23804/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2823
PS1, Line 2823:       // Update loaded tables counter based on table state 
transitions.
              :       if (existingTbl != null) {
              :         boolean wasLoaded = !(existingTbl instanceof 
IncompleteTable);
              :         boolean isLoaded = !(table instanceof IncompleteTable);
              :
              :         if (wasLoaded && !isLoaded) {
              :           numLoadedTables_.decrementAndGet();
              :           LOG.trace("Replaced loaded table {} with 
IncompleteTable, " +
              :               "loaded tables count: {}",
              :               table.getFullName(), numLoadedTables_.get());
              :         } else if (!wasLoaded && isLoaded) {
              :           numLoadedTables_.incrementAndGet();
              :           LOG.trace("Replaced IncompleteTable with loaded table 
{}, " +
              :               "loaded tables count: {}",
              :               table.getFullName(), numLoadedTables_.get());
              :         }
              :       } else if (!(table instanceof IncompleteTable)) {
              :         // New table being added and it's loaded
              :         numLoadedTables_.incrementAndGet();
              :         LOG.trace("Added new loaded table {}, loaded tables 
count: {}",
              :             table.getFullName(), numLoadedTables_.get());
              :       }
I see we are adding codes like this before all the call sites of Db#addTable(). 
It has a high risk of omission when other developers adding new codes that use 
Db#addTable(). It'd be better to move this code into Db#addTable(). The right 
place might be in  CatalogObjectCache#add() which is a synchronized method. We 
can extend this method to add an optional argument for an update function which 
checks the existing and new items and update the counter.

  public synchronized boolean add(T catalogObject, BiConsumer<T, T> updateFunc) 
{

CatalogObjectCache#remove() also needs an update function.


http://gerrit.cloudera.org:8080/#/c/23804/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/23804/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@854
PS1, Line 854: removeTable
I think we should update the counter here if tblToBeRemoved is a loaded table. 
One scenario is loading a table in Impala and then drop it in Hive. Processing 
the DROP_TABLE event uses this method.


http://gerrit.cloudera.org:8080/#/c/23804/1/tests/custom_cluster/test_automatic_invalidation.py
File tests/custom_cluster/test_automatic_invalidation.py:

http://gerrit.cloudera.org:8080/#/c/23804/1/tests/custom_cluster/test_automatic_invalidation.py@105
PS1, Line 105:   def _get_metric_value(self, metric_name):
             :     """Helper to get a metric value from catalogd metrics 
endpoint."""
             :     import json
             :     import requests
             :     url = "http://localhost:25020/metrics?json";
             :     response = requests.get(url)
             :     metrics_json = json.loads(response.text)
             :     for metric in metrics_json["metric_group"]["metrics"]:
             :       if metric["name"] == metric_name:
             :         return int(metric["value"])
             :     return None
Can we use the existing get_metric_value() method? 
https://github.com/apache/impala/blob/85d77b908b12ae3d3f48ed5d49f38fb3832edc4e/tests/common/impala_service.py#L101-L114

e.g. self.cluster.catalogd.service.get_metric_value(metric_name) like
https://github.com/apache/impala/blob/85d77b908b12ae3d3f48ed5d49f38fb3832edc4e/tests/custom_cluster/test_custom_statestore.py#L230-L231


http://gerrit.cloudera.org:8080/#/c/23804/1/tests/custom_cluster/test_automatic_invalidation.py@122
PS1, Line 122:        and removal."""
There are some scenarios that worth testing:

* Reload a loaded table, e.g. REFRESH, ALTER statements.
* Rename a loaded table.


http://gerrit.cloudera.org:8080/#/c/23804/1/tests/custom_cluster/test_automatic_invalidation.py@125
PS1, Line 125:     time.sleep(2)
Why do we need this sleep?


http://gerrit.cloudera.org:8080/#/c/23804/1/tests/custom_cluster/test_automatic_invalidation.py@156
PS1, Line 156:     assert count_after_global_invalidate < count_with_table, (
Shouldn't we expect count_after_global_invalidate == 0 and count_with_table == 
1?


http://gerrit.cloudera.org:8080/#/c/23804/1/tests/custom_cluster/test_automatic_invalidation.py@164
PS1, Line 164:     count_after_create = 
self._get_metric_value("catalog.num-loaded-tables")
Can we assert count_after_create == 0?


http://gerrit.cloudera.org:8080/#/c/23804/1/tests/custom_cluster/test_automatic_invalidation.py@207
PS1, Line 207:     assert count_before_timeout >= 1, (
Can we assert count_before_timeout == 1, or what other tables could be loaded 
at this point?


http://gerrit.cloudera.org:8080/#/c/23804/1/tests/custom_cluster/test_automatic_invalidation.py@220
PS1, Line 220:     assert count_after_timeout < count_before_timeout, (
Can we assert count_after_timeout == 0?



--
To view, visit http://gerrit.cloudera.org:8080/23804
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5aa54f9f7507709b654df22e24592799811e8b6c
Gerrit-Change-Number: 23804
Gerrit-PatchSet: 1
Gerrit-Owner: Arnab Karmakar <[email protected]>
Gerrit-Reviewer: Arnab Karmakar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Pranav Lodha <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Thu, 25 Dec 2025 13:29:12 +0000
Gerrit-HasComments: Yes

Reply via email to