Quanlong Huang created KUDU-3649:
------------------------------------
Summary: Last processed Hive Metastore notification event ID is
not loaded correctly
Key: KUDU-3649
URL: https://issues.apache.org/jira/browse/KUDU-3649
Project: Kudu
Issue Type: Bug
Components: server
Reporter: Quanlong Huang
Attachments: kudu-debug.patch,
kudu-master.quanlong-OptiPlex-BJ.quanlong.log.INFO.20250309-094427.22679
While launching kudu-master with Hive Metastore integration enabled, I don't
see the following log:
{code:cpp}
if (hms_catalog_) {
static const char* const kNotificationLogEventIdDescription =
"Loading latest processed Hive Metastore notification log event ID";
LOG(INFO) << kNotificationLogEventIdDescription << "...";{code}
https://github.com/apache/kudu/blob/e742f86f6d8e687dd02d9891f33e068477163016/src/kudu/master/catalog_manager.cc#L1446-L1447
The kudu version used in Impala's test env is commit e742f86f6. I tried
patching kudu to add more debug logs by [^kudu-debug.patch]. Then I see the
following logs:
{noformat}
I20250309 09:44:28.930799 22774 catalog_manager.cc:1434] Initializing
in-progress tserver states...
I20250309 09:44:28.930833 22774 catalog_manager.cc:1455] hms_catalog_ is nullptr
I20250309 09:44:28.930859 22765 hms_catalog.cc:109] Initializing HmsCatalog
I20250309 09:44:28.931007 22790 hms_notification_log_listener.cc:222]
durable_event_id = -1, batch_size = 100
I20250309 09:44:28.936439 22788 hms_client.cc:369] Fetching 100 HMS events from
id -1{noformat}
This means there is a race between initializing hms_catalog_
[here|https://github.com/apache/kudu/blob/e742f86f6d8e687dd02d9891f33e068477163016/src/kudu/master/catalog_manager.cc#L1043]
and using it
[here|https://github.com/apache/kudu/blob/e742f86f6d8e687dd02d9891f33e068477163016/src/kudu/master/catalog_manager.cc#L1444].
hms_notification_log_event_id_ is not loaded correctly and keeps using its
original value -1. The hms-notification-log-listener thread in Kudu-master will
start fetching all HMS notification events due to this. In my local env, there
are lots of HMS notification events that have a large message body causing HMS
OOM to serve these requests. So HmsNotificationLogListenerTask::Poll() never
succeeds and keep polling events from id -1. Attached the master logs:
[^kudu-master.quanlong-OptiPlex-BJ.quanlong.log.INFO.20250309-094427.22679]
Due to this, creating managed tables in Kudu will failed with the following
error. IMPALA-13846 is an example.
{code}
failed to wait for Hive Metastore notification log listener to catch up: failed
to retrieve notification log events: failed to get Hive Metastore next
notification: No more data to read.{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)