[
https://issues.apache.org/jira/browse/IMPALA-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18016395#comment-18016395
]
ASF subversion and git services commented on IMPALA-14283:
----------------------------------------------------------
Commit 774c01c2b50ea9108da5fd9e0fb608e1b903071f in impala's branch
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=774c01c2b ]
IMPALA-14306: Frontend.getDataSrcs() should retry for
InconsistentMetadataFetchException
After IMPALA-14283, coordinator will throw an
InconsistentMetadataFetchException when witness catalogd service ID
changes. The Frontend code should retry the request to use fresh
metadata. This patch resolves the TODO item in Frontend.getDataSrcs() to
add the retry as other methods do. Also change the catch clause in
LocalCatalog.getDataSources() so InconsistentMetadataFetchException can
be thrown from it.
Tests:
- Ran TestExtDataSources.test_catalogd_ha_failover 100 times.
Change-Id: I483423680a5c953aaf3446b62c8b8ee08d6c6385
Reviewed-on: http://gerrit.cloudera.org:8080/23347
Reviewed-by: Wenzhe Zhou <[email protected]>
Reviewed-by: Riza Suminto <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Coordinator could use stale metadata in catalogd warm failover
> --------------------------------------------------------------
>
> Key: IMPALA-14283
> URL: https://issues.apache.org/jira/browse/IMPALA-14283
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
> Fix For: Impala 5.0.0
>
> Attachments: catalogd.INFO, catalogd_node1.INFO, impalad.INFO,
> show_parts_profile.txt, statestored.INFO
>
>
> There is a corner case in local catalog mode that when catalogd warm failover
> finishes, coordinator fetches the new table object from the new active
> catalogd and it might coincidently "activate" the stale metadata in the
> cache. This happens when matching all the following conditions:
> * The new table object has a catalog version that has been used in the
> previous active catalogd and it was assigned to the same table name.
> * The coordinator has loaded table metadata associated to that version from
> the previous active catalogd, e.g. partition list, partitions or column stats
> are loaded and still in the cache (not aged out or evicted yet).
> * The initial catalog update of the new active catalogd hasn't arrived yet.
> There is some delay when statestore broadcast the updates. If the statestore
> update arrives before the query planning, we are safe since processing the
> update will clear the cache.
> This bug is found by *TestCatalogdHA.test_warmed_up_metadata_failover_catchup*
> {code:python}
> tests/custom_cluster/test_catalogd_ha.py:640: in
> test_warmed_up_metadata_failover_catchup
>
>
> db, self._add_new_partition, self._verify_new_partition)
>
>
> tests/custom_cluster/test_catalogd_ha.py:762: in
> _test_metadata_after_failover
>
>
> assert str(active_catalogd.service.service_port) in str(e)
>
>
> E assert '26001' in "assert 'year=2025/month=1' in
> 'Total\\t\\t-1\\t0\\t0B\\t0B\\t\\t\\t\\t\\t'"
>
>
> E + where '26001' = str(26001)
>
>
> E + where 26001 = <tests.common.impala_service.CatalogdService object
> at 0x7f5d8c24fd10>.service_port
>
> E + where <tests.common.impala_service.CatalogdService object at
> 0x7f5d8c24fd10> = <CatalogdProcess PID: None
> (/root/Impala-upstream/be/build/latest/service/cata...te_store_subscriber_port=23021
> -webserver_
> port=25021 -enable_catalogd_ha=true)>.service
>
>
> E + and "assert 'year=2025/month=1' in
> 'Total\\t\\t-1\\t0\\t0B\\t0B\\t\\t\\t\\t\\t'" = str(AssertionError(u"assert
> 'year=2025/month=1' in 'Total\\t\\t-1\\t0\\t0B\\t0B\\t\\t\\t\\t\\t'",)){code}
> The test runs an AddPartition statement and kills the active catalogd, then
> runs ShowPartitions in the new active catalogd. It fails since coordinator
> returns an empty partition list. Test logs:
> {noformat}
> -- 2025-07-31 17:10:00,755 INFO MainThread: hs2: executing against Impala
> at quanlong-ha-test.vpc.cloudera.com:21050. session:
> ea4e2fca5d5b35f3:c51bd83bdb2417a9 main_cursor: True user: None
>
>
>
>
> alter table warmup_test_db.tbl add partition(year=2025, month=1);
>
>
>
>
>
> -- 2025-07-31 17:10:01,119 INFO MainThread:
> b84a8a27014a3340:af77289200000000: query started
>
>
> -- 2025-07-31 17:10:01,120 INFO MainThread:
> b84a8a27014a3340:af77289200000000: getting log for operation
>
>
> -- 2025-07-31 17:10:01,120 INFO MainThread:
> b84a8a27014a3340:af77289200000000: getting runtime profile operation
>
>
> -- 2025-07-31 17:10:01,120 INFO MainThread:
> b84a8a27014a3340:af77289200000000: closing query for operation
>
>
> -- 2025-07-31 17:10:01,147 INFO MainThread: Found PID 102378 for
> /root/Impala-upstream/be/build/latest/service/catalogd -logbufsecs=5 -v=1
> -max_log_files=10 -log_rotation_match_pid=true -log_filename=catalogd_n
> ode1 -log_dir=/tmp/ -kudu_master_hosts localhost --catalog_topic_mode=minimal
> --catalogd_ha_reset_metadata_on_failover=false
> --debug_actions=catalogd_event_processing_delay:SLEEP@1000
> --enable_reload_events=true --
> warmup_tables_config_file=file:///root/Impala-upstream/testdata/data/warmup_test_config.txt
> -catalog_service_port=26001 -state_store_subscriber_port=23021
> -webserver_port=25021 -enable_catalogd_ha=true
> -- 2025-07-31 17:10:01,172 INFO MainThread: Killing <CatalogdProcess PID:
> 102378 (/root/Impala-upstream/be/build/latest/service/catalogd -logbufsecs=5
> -v=1 -max_log_files=10 -log_rotation_match_pid=true -log_fi
> lename=catalogd_node1 -log_dir=/tmp/ -kudu_master_hosts localhost
> --catalog_topic_mode=minimal --catalogd_ha_reset_metadata_on_failover=false
> --debug_actions=catalogd_event_processing_delay:SLEEP@1000 --enable_relo
> ad_events=true
> --warmup_tables_config_file=file:///root/Impala-upstream/testdata/data/warmup_test_config.txt
> -catalog_service_port=26001 -state_store_subscriber_port=23021
> -webserver_port=25021 -enable_catalogd_ha=
> true)> with signal 9
>
>
> -- 2025-07-31 17:10:01,177 INFO MainThread: Getting metric:
> catalog-server.active-status from quanlong-ha-test.vpc.cloudera.com:25020
>
> -- 2025-07-31 17:10:01,180 INFO MainThread: Waiting for metric value
> 'catalog-server.active-status'=True. Current value: False. total_wait: 0s
> -- 2025-07-31 17:10:01,180 INFO MainThread: Sleeping 1s before next
> retry.
>
> -- 2025-07-31 17:10:02,181 INFO MainThread: Getting metric:
> catalog-server.active-status from quanlong-ha-test.vpc.cloudera.com:25020
>
> -- 2025-07-31 17:10:02,184 INFO MainThread: Metric
> 'catalog-server.active-status' has reached desired value: True. total_wait:
> 1.00396800041s
> -- 2025-07-31 17:10:02,188 INFO MainThread: hs2: executing against Impala
> at quanlong-ha-test.vpc.cloudera.com:21050. session:
> ea4e2fca5d5b35f3:c51bd83bdb2417a9 main_cursor: True user: None
>
>
>
> show partitions warmup_test_db.tbl;
>
>
>
> -- 2025-07-31 17:10:03,737 INFO MainThread:
> 664e4d97f4712dfb:e4d0877500000000: query started
> -- 2025-07-31 17:10:03,738 INFO MainThread:
> 664e4d97f4712dfb:e4d0877500000000: getting log for operation
> -- 2025-07-31 17:10:03,738 INFO MainThread:
> 664e4d97f4712dfb:e4d0877500000000: getting runtime profile operation
> -- 2025-07-31 17:10:03,738 INFO MainThread:
> 664e4d97f4712dfb:e4d0877500000000: closing query for operation
> -- 2025-07-31 17:10:03,739 INFO MainThread: partition result:
> ['Total\t\t-1\t0\t0B\t0B\t\t\t\t\t']{noformat}
> Profile of ShowPartitions indicates some stale metadata are used:
> {noformat}
> Original Table Versions: warmup_test_db.tbl, 314, 1754007002807, Thu
> Jul 31 17:10:02 PDT 2025
> - CatalogFetch.ColumnStats.Hits: 13 <--- This shouldn't hit
> - CatalogFetch.ColumnStats.Misses: 0
> - CatalogFetch.ColumnStats.Requests: 13
> - CatalogFetch.ColumnStats.Time: 0
> - CatalogFetch.DatabaseList.Hits: 1
> - CatalogFetch.DatabaseList.Misses: 0
> - CatalogFetch.DatabaseList.Requests: 1
> - CatalogFetch.DatabaseList.Time: 0
> - CatalogFetch.PartitionLists.Hits: 1 <--- This shouldn't hit
> - CatalogFetch.PartitionLists.Misses: 0
> - CatalogFetch.PartitionLists.Requests: 1
> - CatalogFetch.PartitionLists.Time: 0
> - CatalogFetch.RPCs.Bytes: 1.78 KB (1818)
> - CatalogFetch.RPCs.Requests: 2
> - CatalogFetch.RPCs.Time: 1s531ms
> - CatalogFetch.StorageLoad.Time: 22ms
> - CatalogFetch.TableList.Hits: 1
> - CatalogFetch.TableList.Misses: 1
> - CatalogFetch.TableList.Requests: 2
> - CatalogFetch.TableList.Time: 1s516ms
> - CatalogFetch.Tables.Hits: 0
> - CatalogFetch.Tables.Misses: 1
> - CatalogFetch.Tables.Requests: 1
> - CatalogFetch.Tables.Time: 15ms
> - ExecutorGroupsConsidered: 1 (1){noformat}
> In logs of the new active catalogd (catalogd.INFO), the catalog version of
> the table is 314:
> {noformat}
> I20250731 17:10:02.760269 103771 CatalogOpExecutor.java:4872] Found 1/1
> partitions to add in table warmup_test_db.tbl from event 888
> I20250731 17:10:02.807034 103771 HdfsTable.java:781] Loaded file and block
> metadata for warmup_test_db.tbl partitions: year=2025/month=1. Time taken:
> 27.856ms
> I20250731 17:10:02.807621 103771 MetastoreEvents.java:860] EventId: 888
> EventType: ADD_PARTITION Target: warmup_test_db.tbl. Successfully added 1
> partitions to table warmup_test_db.tbl
> I20250731 17:10:02.807871 103771 MetastoreEventsProcessor.java:1578] Time
> elapsed in processing event batch: 1s156ms
> I20250731 17:10:03.673661 103802 JniUtil.java:167] getEventProcessorSummary
> request: Getting event processor summary
> I20250731 17:10:03.674260 103802 JniUtil.java:178] Finished
> getEventProcessorSummary request: Getting event processor summary. Time
> spent: 1ms
> I20250731 17:10:03.674295 103802 catalog-server.cc:978] EventProcessor is
> synced up with HMS event id during failover: 888
> I20250731 17:10:03.674309 103802 catalog-server.cc:887] This catalogd
> instance is changed to active status
> I20250731 17:10:03.677537 103773 JniUtil.java:167] getCatalogDelta request:
> Getting catalog delta from version 0
> I20250731 17:10:03.687700 103773 catalog-server.cc:1615] Collected update:
> 2:DATABASE:default, version=2, original size=48, compressed size=49
> ...
> I20250731 17:10:03.720110 103773 CatalogServiceCatalog.java:1965] Skipped 0
> partitions of table warmup_test_db.tbl in the incremental update
> I20250731 17:10:03.720285 103773 CatalogServiceCatalog.java:1978] Collected 1
> partition update(s): 2:HDFS_PARTITION:warmup_test_db.tbl:year=2025/month=1,
> version=314, original size=100, compressed size=101
> I20250731 17:10:03.720324 103773 catalog-server.cc:1615] Collected update:
> 2:TABLE:warmup_test_db.tbl, version=314, original size=65, compressed
> size=66{noformat}
> In the previous catalogd logs (catalogd_node1.INFO), the table version before
> AddPartition is also 314!
> {noformat}
> I20250731 17:09:58.553951 103647 HdfsTable.java:1256] Loading metadata for
> table definition and all partition(s) of warmup_test_db.tbl (needed by
> coordinator)
> I20250731 17:09:58.600651 103647 HdfsTable.java:1919] Loaded 13 columns from
> HMS. Actual columns: 13
> I20250731 17:09:58.623989 103647 HdfsTable.java:2987] Load Valid Write Id
> List Done. Time taken: 24.253us
> I20250731 17:09:58.624092 103647 HdfsTable.java:1303] Fetching partition
> metadata from the Metastore: warmup_test_db.tbl
> I20250731 17:09:58.628580 103647 MetaStoreUtil.java:191] Fetching 0
> partitions for: warmup_test_db.tbl using partition batch size: 1000
> I20250731 17:09:58.628685 103647 HdfsTable.java:1310] Fetched partition
> metadata from the Metastore: warmup_test_db.tbl
> I20250731 17:09:58.649619 103647 HdfsTable.java:781] Loaded file and block
> metadata for warmup_test_db.tbl partitions: . Time taken: 3.319ms
> I20250731 17:09:58.651077 103647 Table.java:1106] last refreshed event id for
> table: warmup_test_db.tbl set to: -1
> I20250731 17:09:58.651185 103647 TableLoader.java:187] Loaded metadata for:
> warmup_test_db.tbl (129ms)
> I20250731 17:09:58.685761 102954 JniUtil.java:167] getCatalogDelta request:
> Getting catalog delta from version 313
> I20250731 17:09:58.693717 102954 CatalogServiceCatalog.java:1965] Skipped 0
> partitions of table warmup_test_db.tbl in the incremental update
> I20250731 17:09:58.693792 102954 catalog-server.cc:1615] Collected update:
> 2:TABLE:warmup_test_db.tbl, version=314, original size=65, compressed
> size=66{noformat}
> So coordinator has loaded the empty partition list of it in cache. It's
> associated with table name "warmup_test_db.tbl" and catalog version 314. Then
> coincidently reused after catalogd warm failover.
> We should add the catalog service id in the getPartialCatalogObject response
> and coordinator should clear the cache when the id changes, just like what we
> do in updateCatalogCache() ->
> [witnessCatalogServiceId()|https://github.com/apache/impala/blob/5414c3009890a736901b6ae3b56b9953b1992de7/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L1614].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]