[ 
https://issues.apache.org/jira/browse/IMPALA-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17933575#comment-17933575
 ] 

Quanlong Huang commented on IMPALA-13846:
-----------------------------------------

In my local env, there are lots of HMS notification events from several years 
ago though I didn't set hive.metastore.event.db.listener.timetolive to that 
high:
{noformat}
$ psql -q -U hiveuser ${METASTORE_DB}
HMS_home_quanlong_workspace_Impala_cdp=> select count(*) from 
"NOTIFICATION_LOG";
  count  
---------
 1134925
(1 row)

HMS_home_quanlong_workspace_Impala_cdp=> select "EVENT_ID", 
to_timestamp("EVENT_TIME"), "EVENT_TYPE" from "NOTIFICATION_LOG" order by 
"EVENT_ID" limit 10;
 EVENT_ID |      to_timestamp      |  EVENT_TYPE   
----------+------------------------+---------------
  7990170 | 2022-12-26 15:22:59+08 | ADD_PARTITION
  7990171 | 2022-12-26 15:23:09+08 | ADD_PARTITION
  7990172 | 2022-12-26 15:23:20+08 | ADD_PARTITION
  7990173 | 2022-12-26 15:23:30+08 | ADD_PARTITION
  7990174 | 2022-12-26 15:23:41+08 | ADD_PARTITION
  7990175 | 2022-12-26 15:23:51+08 | ADD_PARTITION
  7990176 | 2022-12-26 15:24:01+08 | ADD_PARTITION
  7990177 | 2022-12-26 15:24:12+08 | ADD_PARTITION
  7990178 | 2022-12-26 15:24:23+08 | ADD_PARTITION
  7990179 | 2022-12-26 15:24:33+08 | ADD_PARTITION
(10 rows){noformat}
Thanks for [~dengzh]'s help, we found they are not cleaned up due to the 
DB-Notification-Cleaner thread dies (HIVE-28808).

> test_drop_managed_kudu_table failed in CreateTable
> --------------------------------------------------
>
>                 Key: IMPALA-13846
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13846
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>
> custom_cluster/test_kudu.py::TestKuduHMSIntegration::test_drop_managed_kudu_table
>  consistently failed in my local env. The failure is in creating the Kudu 
> managed table with HMS integration enabled.
> {code}
> tests/custom_cluster/test_kudu.py:306: in test_drop_managed_kudu_table
>     PARTITIONS 3 STORED AS KUDU""" % (unique_database, impala_tbl_name))
> infra/python/env-gcc10.4.0/lib/python2.7/site-packages/impala/hiveserver2.py:343:
>  in execute
>     self._wait_to_finish()  # make execute synchronous
> infra/python/env-gcc10.4.0/lib/python2.7/site-packages/impala/hiveserver2.py:428:
>  in _wait_to_finish
>     raise OperationalError(resp.errorMessage)
> E   OperationalError: Query 3c4eebe048d531d7:97c6ea2f00000000 failed:
> E   ImpalaRuntimeException: Error creating Kudu table 
> 'test_drop_managed_kudu_table_a82c250c.foo'
> E   CAUSED BY: NonRecoverableException: failed to wait for Hive Metastore 
> notification log listener to catch up: failed to retrieve notification log 
> events: failed to get Hive Metastore next notification: No more data to 
> read.{code}
> Catalogd logs show that the error is thrown in KuduClient.createTable() from 
> an error in Kudu master (RpcProxy.dispatchMasterError):
> {noformat}
> E0309 09:16:50.324411 13234 JniUtil.java:184] 
> de42baedc07418aa:e535aa7000000000] Error in CREATE_TABLE 
> test_drop_managed_kudu_table_a82c250c.foo issued by quanlong. Time spent: 
> 10s139ms
> I0309 09:16:50.324739 13234 jni-util.cc:321] 
> de42baedc07418aa:e535aa7000000000] 
> org.apache.impala.common.ImpalaRuntimeException: Error creating Kudu table 
> 'test_drop_managed_kudu_table_a82c250c.foo'
>         at 
> org.apache.impala.service.KuduCatalogOpExecutor.createSynchronizedTable(KuduCatalogOpExecutor.java:157)
>         at 
> org.apache.impala.service.CatalogOpExecutor.createKuduTable(CatalogOpExecutor.java:3885)
>         at 
> org.apache.impala.service.CatalogOpExecutor.createTable(CatalogOpExecutor.java:3749)
>         at 
> org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:491)
>         at 
> org.apache.impala.service.JniCatalog.lambda$execDdl$3(JniCatalog.java:318)
>         at 
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
>         at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
>         at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
>         at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100)
>         at 
> org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:247)
>         at 
> org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:261)
>         at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:317)
> Caused by: org.apache.kudu.client.NonRecoverableException: failed to wait for 
> Hive Metastore notification log listener to catch up: failed to retrieve 
> notification log events: failed to get Hive Metastore next notification: No 
> more data to read.
>         at 
> org.apache.kudu.client.KuduException.transformException(KuduException.java:110)
>         at 
> org.apache.kudu.client.KuduClient.joinAndHandleException(KuduClient.java:564)
>         at org.apache.kudu.client.KuduClient.createTable(KuduClient.java:140)
>         at 
> org.apache.impala.service.KuduCatalogOpExecutor.createKuduTable(KuduCatalogOpExecutor.java:105)
>         at 
> org.apache.impala.service.KuduCatalogOpExecutor.createSynchronizedTable(KuduCatalogOpExecutor.java:146)
>         ... 11 more 
>         Suppressed: org.apache.kudu.client.KuduException$OriginalException: 
> Original asynchronous stack trace
>                 at 
> org.apache.kudu.client.RpcProxy.dispatchMasterError(RpcProxy.java:414)
>                 at 
> org.apache.kudu.client.RpcProxy.responseReceived(RpcProxy.java:288)
>                 at 
> org.apache.kudu.client.RpcProxy.access$000(RpcProxy.java:64)
>                 at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:158)
>                 at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:154)
>                 at 
> org.apache.kudu.client.Connection.channelRead0(Connection.java:377)
>                 at 
> org.apache.kudu.shaded.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
>                 at 
> org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
>                 at 
> org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
>                 at 
> org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
>                 at 
> org.apache.kudu.shaded.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
>                 at 
> org.apache.kudu.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
>                 at 
> org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
>                 at 
> org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
>                 at 
> org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
>                 at 
> org.apache.kudu.shaded.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
>                 at 
> org.apache.kudu.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
>                 at 
> org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
>                 at 
> org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
>                 at 
> org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
>                 at 
> org.apache.kudu.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>                 at 
> org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
>                 at 
> org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
>                 at 
> org.apache.kudu.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>                 at 
> org.apache.kudu.shaded.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
>                 at 
> org.apache.kudu.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
>                 at 
> org.apache.kudu.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
>                 at 
> org.apache.kudu.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
>                 at 
> org.apache.kudu.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
>                 at 
> org.apache.kudu.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
>                 at 
> org.apache.kudu.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>                 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>                 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>                 at java.lang.Thread.run(Thread.java:750){noformat}
> Looking into the logs of kudu-master under 
> logs/cluster/cdh7-node-1/kudu/master, it does show the request and the error:
> {noformat}
> I20250309 09:16:41.136718 10678 catalog_manager.cc:1824] Servicing 
> CreateTable request from {username='quanlong'} at 127.0.0.1:41168:
> name: "test_drop_managed_kudu_table_a82c250c.foo"
> schema {
>   columns {
>     name: "a" 
>     type: INT32
>     is_key: true
>     is_nullable: false
>     cfile_block_size: 0
>     immutable: false
>     is_auto_incrementing: false
>   }
> }
> partition_schema {
>   hash_schema {
>     columns {
>       name: "a" 
>     }   
>     num_buckets: 3
>     seed: 0
>   }
>   range_schema {
>   }
> }
> owner: "quanlong"
> I20250309 09:16:41.526204 10676 catalog_manager.cc:5300] T 
> e2161695372d4444b77025ca654b5747 P 2bb1c6d7b90f417db34cf73df443b148 reported 
> cstate change: term changed from 336 to 337. New cstate: current_term: 337 
> leader_uuid: "2bb1c6d7b90f417db34cf73df443b148" committed_config { 
> opid_index: -1 OBSOLETE_local: false peers { permanent_uuid: 
> "14072394022049af8dbdc51ffd963a96" member_type: VOTER last_known_addr { host: 
> "127.0.0.1" port: 31200 } health_report { overall_health: UNKNOWN } } peers { 
> permanent_uuid: "e0e1db54dab74f208e37ea1b975595e5" member_type: VOTER 
> last_known_addr { host: "127.0.0.1" port: 31202 } health_report { 
> overall_health: UNKNOWN } } peers { permanent_uuid: 
> "2bb1c6d7b90f417db34cf73df443b148" member_type: VOTER last_known_addr { host: 
> "127.0.0.1" port: 31201 } health_report { overall_health: HEALTHY } } }
> I20250309 09:16:41.653339 10680 catalog_manager.cc:5300] T 
> b17dd536e95d4719a0b78590b162f383 P 14072394022049af8dbdc51ffd963a96 reported 
> cstate change: term changed from 315 to 316. New cstate: current_term: 316 
> leader_uuid: "14072394022049af8dbdc51ffd963a96" committed_config { 
> opid_index: -1 OBSOLETE_local: false peers { permanent_uuid: 
> "e0e1db54dab74f208e37ea1b975595e5" member_type: VOTER last_known_addr { host: 
> "127.0.0.1" port: 31202 } health_report { overall_health: UNKNOWN } } peers { 
> permanent_uuid: "14072394022049af8dbdc51ffd963a96" member_type: VOTER 
> last_known_addr { host: "127.0.0.1" port: 31200 } health_report { 
> overall_health: HEALTHY } } peers { permanent_uuid: 
> "2bb1c6d7b90f417db34cf73df443b148" member_type: VOTER last_known_addr { host: 
> "127.0.0.1" port: 31201 } health_report { overall_health: UNKNOWN } } }
> W20250309 09:16:45.716501 10970 hms_client.cc:367] Time spent get HMS 
> notification events: real 4.579s  user 0.000s     sys 0.000s
> W20250309 09:16:50.320043 10970 hms_client.cc:367] Time spent get HMS 
> notification events: real 4.602s  user 0.000s     sys 0.000s
> W20250309 09:16:50.320114 10970 client.h:208] Call to Hive Metastore failed 
> after 1 retries: Network error: failed to get Hive Metastore next 
> notification: No more data to read.
> W20250309 09:16:50.320169 10972 hms_notification_log_listener.cc:134] Hive 
> Metastore notification log listener poll failed: Network error: failed to 
> retrieve notification log events: failed to get Hive Metastore next 
> notification: No more data to read.
> I20250309 09:16:50.320242 10678 rpcz_store.cc:269] Call 
> kudu.master.MasterService.CreateTable from 127.0.0.1:41168 (request call id 
> 2) took 9183ms. Trace:
> I20250309 09:16:50.320261 10678 rpcz_store.cc:270] 0309 09:16:41.136692 (+    
>  0us) service_pool.cc:169] Inserting onto call queue
> 0309 09:16:41.136701 (+     9us) service_pool.cc:228] Handling call
> 0309 09:16:50.320240 (+9183539us) inbound_call.cc:173] Queueing success 
> response
> Metrics: {}{noformat}
> Looking into the HMS logs, there are some OOM errors:
> {noformat}
> 2025-03-09T09:16:41,137  INFO [TThreadPoolServer WorkerProcess-214] 
> metastore.HiveMetaStore: 157: source:127.0.0.1 get_config_value: 
> name=hive.metastore.notifications.add.thrift.objects defaultValue=true
> 2025-03-09T09:16:41,137  INFO [TThreadPoolServer WorkerProcess-214] 
> HiveMetaStore.audit: ugi=quanlong   ip=127.0.0.1    cmd=source:127.0.0.1 
> get_config_value: name=hive.metastore.notifications.add.thrift.objects 
> defaultValue=true
> 2025-03-09T09:16:41,138  INFO [TThreadPoolServer WorkerProcess-214] 
> metastore.HiveMetaStore: 157: Opening raw store with implementation 
> class:org.apache.hadoop.hive.metastore.ObjectStore
> 2025-03-09T09:16:41,138  INFO [TThreadPoolServer WorkerProcess-214] 
> metastore.ObjectStore: RawStore: 
> org.apache.hadoop.hive.metastore.ObjectStore@5894fa72, with 
> PersistenceManager: null will be shutdown
> 2025-03-09T09:16:41,138  INFO [TThreadPoolServer WorkerProcess-214] 
> metastore.ObjectStore: RawStore: 
> org.apache.hadoop.hive.metastore.ObjectStore@5894fa72, with 
> PersistenceManager: org.datanucleus.api.jdo.JDOPersistenceManager@3ffb4518 
> created in the thread with id: 1576
> 2025-03-09T09:16:41,139  INFO [TThreadPoolServer WorkerProcess-214] 
> metastore.HiveMetaStore: Created RawStore: 
> org.apache.hadoop.hive.metastore.ObjectStore@5894fa72 from thread id: 1576
> 2025-03-09T09:16:45,715 ERROR [TThreadPoolServer WorkerProcess-214] 
> metastore.RetryingHMSHandler: java.lang.OutOfMemoryError: Java heap 
> space{noformat}
> The first OOM error has the stacktrace:
> {noformat}
> 2025-03-08T07:04:58,313 ERROR [TThreadPoolServer WorkerProcess-32] 
> metastore.RetryingHMSHandler: java.lang.OutOfMemoryError: Java heap space
>         at java.lang.StringCoding.decode(StringCoding.java:215)
>         at java.lang.String.<init>(String.java:463)
>         at org.postgresql.core.Encoding.decode(Encoding.java:284)
>         at org.postgresql.core.Encoding.decode(Encoding.java:295)
>         at org.postgresql.jdbc.PgResultSet.getString(PgResultSet.java:2256)
>         at 
> com.zaxxer.hikari.pool.HikariProxyResultSet.getString(HikariProxyResultSet.java)
>         at 
> org.datanucleus.store.rdbms.mapping.column.LongVarcharColumnMapping.getString(LongVarcharColumnMapping.java:102)
>         at 
> org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.getString(SingleFieldMapping.java:188)
>         at 
> org.datanucleus.store.rdbms.fieldmanager.ResultSetGetter.fetchStringField(ResultSetGetter.java:133)
>         at 
> org.datanucleus.state.StateManagerImpl.replacingStringField(StateManagerImpl.java:1986)
>         at 
> org.apache.hadoop.hive.metastore.model.MNotificationLog.dnReplaceField(MNotificationLog.java)
>         at 
> org.apache.hadoop.hive.metastore.model.MNotificationLog.dnReplaceFields(MNotificationLog.java)
>         at 
> org.datanucleus.state.StateManagerImpl.replaceFields(StateManagerImpl.java:4352)
>         at 
> org.datanucleus.store.rdbms.query.PersistentClassROF$1.fetchFields(PersistentClassROF.java:528)
>         at 
> org.datanucleus.state.StateManagerImpl.loadFieldValues(StateManagerImpl.java:3743)
>         at 
> org.datanucleus.state.StateManagerImpl.initialiseForHollow(StateManagerImpl.java:383)
>         at 
> org.datanucleus.state.ObjectProviderFactoryImpl.newForHollow(ObjectProviderFactoryImpl.java:99)
>         at 
> org.datanucleus.ExecutionContextImpl.findObject(ExecutionContextImpl.java:3199)
>         at 
> org.datanucleus.store.rdbms.query.PersistentClassROF.findObjectWithIdAndLoadFields(PersistentClassROF.java:523)
>         at 
> org.datanucleus.store.rdbms.query.PersistentClassROF.getObject(PersistentClassROF.java:456)
>         at 
> org.datanucleus.store.rdbms.query.ForwardQueryResult.nextResultSetElement(ForwardQueryResult.java:181)
>         at 
> org.datanucleus.store.rdbms.query.ForwardQueryResult$QueryResultIterator.next(ForwardQueryResult.java:409)
>         at 
> org.datanucleus.store.rdbms.query.ForwardQueryResult.processNumberOfResults(ForwardQueryResult.java:137)
>         at 
> org.datanucleus.store.rdbms.query.ForwardQueryResult.advanceToEndOfResultSet(ForwardQueryResult.java:165)
>         at 
> org.datanucleus.store.rdbms.query.ForwardQueryResult.closingConnection(ForwardQueryResult.java:291)
>         at 
> org.datanucleus.store.query.AbstractQueryResult.disconnect(AbstractQueryResult.java:105)
>         at 
> org.datanucleus.store.rdbms.query.AbstractRDBMSQueryResult.disconnect(AbstractRDBMSQueryResult.java:282)
>         at 
> org.datanucleus.store.rdbms.query.JDOQLQuery$2.transactionPreClose(JDOQLQuery.java:725)
>         at 
> org.datanucleus.store.connection.AbstractManagedConnection.transactionPreClose(AbstractManagedConnection.java:117)
>         at 
> org.datanucleus.store.connection.ConnectionManagerImpl$2.transactionPreCommit(ConnectionManagerImpl.java:514)
>         at 
> org.datanucleus.TransactionImpl.internalPreCommit(TransactionImpl.java:397)
>         at 
> org.datanucleus.TransactionImpl.commit(TransactionImpl.java:287){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to