[ 
https://issues.apache.org/jira/browse/KUDU-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17934056#comment-17934056
 ] 

Quanlong Huang commented on KUDU-3649:
--------------------------------------

Thanks for the reply in details! I'm checking the test failures. Meanwhile, 
there are something that I can confirm:

In my env, it's not the initial setup of Kudu HMS integration but I might 
shutdown Kudu for some time. Here are some logs while I was debugging the patch:
{noformat}
I20250309 11:17:18.558643 24245 catalog_manager.cc:1447] Loading latest 
processed Hive Metastore notification log event ID...
I20250309 11:17:18.558692 24245 catalog_manager.cc:5522] Last processed Hive 
Metastore notification event ID: 15337{noformat}
Without the patch, it always starts from -1.
{quote}I'd think of filing a JIRA with the HMS/Hive project to make it more 
robust then. Hitting OOM condition when trying to send out information on just 
100 events per batch is worrisome, indeed, even if the whole history of events 
is being fetched eventually.

I'm going to take a closer look to check if there is indeed a race there. 
Meanwhile, if you suspect the HMS OOM issue might be worked around by fetching 
HMS events in smaller batches, consider tweaking the 
--hive_metastore_notification_log_batch_size flag for Kudu master, setting it 
to a smaller value.
{quote}
Yeah, setting hive_metastore_notification_log_batch_size to a smaller value 
helps. HMS could generate some huge ALTER_PARTITION events that Kudu can't 
fetch in the default batch size (100). In later versions of Hive, clients can 
add event types that they want to skip. Filed KUDU-3650 to improve this.

> Last processed Hive Metastore notification event ID is not loaded correctly
> ---------------------------------------------------------------------------
>
>                 Key: KUDU-3649
>                 URL: https://issues.apache.org/jira/browse/KUDU-3649
>             Project: Kudu
>          Issue Type: Bug
>          Components: server
>            Reporter: Quanlong Huang
>            Priority: Critical
>         Attachments: kudu-debug.patch, 
> kudu-master.quanlong-OptiPlex-BJ.quanlong.log.INFO.20250309-094427.22679
>
>
> While launching kudu-master with Hive Metastore integration enabled, I don't 
> see the following log:
> {code:cpp}
>     if (hms_catalog_) {
>       static const char* const kNotificationLogEventIdDescription =
>           "Loading latest processed Hive Metastore notification log event ID";
>       LOG(INFO) << kNotificationLogEventIdDescription << "...";{code}
> https://github.com/apache/kudu/blob/e742f86f6d8e687dd02d9891f33e068477163016/src/kudu/master/catalog_manager.cc#L1446-L1447
> The kudu version used in Impala's test env is commit e742f86f6. I tried 
> patching kudu to add more debug logs by  [^kudu-debug.patch]. Then I see the 
> following logs:
> {noformat}
> I20250309 09:44:28.930799 22774 catalog_manager.cc:1434] Initializing 
> in-progress tserver states...
> I20250309 09:44:28.930833 22774 catalog_manager.cc:1455] hms_catalog_ is 
> nullptr
> I20250309 09:44:28.930859 22765 hms_catalog.cc:109] Initializing HmsCatalog
> I20250309 09:44:28.931007 22790 hms_notification_log_listener.cc:222] 
> durable_event_id = -1, batch_size = 100 
> I20250309 09:44:28.936439 22788 hms_client.cc:369] Fetching 100 HMS events 
> from id -1{noformat}
> This means there is a race between initializing hms_catalog_ 
> [here|https://github.com/apache/kudu/blob/e742f86f6d8e687dd02d9891f33e068477163016/src/kudu/master/catalog_manager.cc#L1043]
>  and using it 
> [here|https://github.com/apache/kudu/blob/e742f86f6d8e687dd02d9891f33e068477163016/src/kudu/master/catalog_manager.cc#L1444].
> hms_notification_log_event_id_ is not loaded correctly and keeps using its 
> original value -1. The hms-notification-log-listener thread in Kudu-master 
> will start fetching all HMS notification events due to this. In my local env, 
> there are lots of HMS notification events that have a large message body 
> causing HMS OOM to serve these requests. So 
> HmsNotificationLogListenerTask::Poll() never succeeds and keep polling events 
> from id -1. Attached the master logs: 
> [^kudu-master.quanlong-OptiPlex-BJ.quanlong.log.INFO.20250309-094427.22679]
> Due to this, creating managed tables in Kudu will failed with the following 
> error. IMPALA-13846 is an example.
> {code}
> failed to wait for Hive Metastore notification log listener to catch up: 
> failed to retrieve notification log events: failed to get Hive Metastore next 
> notification: No more data to read.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to