[
https://issues.apache.org/jira/browse/IMPALA-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17933575#comment-17933575
]
Quanlong Huang commented on IMPALA-13846:
-----------------------------------------
In my local env, there are lots of HMS notification events from several years
ago though I didn't set hive.metastore.event.db.listener.timetolive to that
high:
{noformat}
$ psql -q -U hiveuser ${METASTORE_DB}
HMS_home_quanlong_workspace_Impala_cdp=> select count(*) from
"NOTIFICATION_LOG";
count
---------
1134925
(1 row)
HMS_home_quanlong_workspace_Impala_cdp=> select "EVENT_ID",
to_timestamp("EVENT_TIME"), "EVENT_TYPE" from "NOTIFICATION_LOG" order by
"EVENT_ID" limit 10;
EVENT_ID | to_timestamp | EVENT_TYPE
----------+------------------------+---------------
7990170 | 2022-12-26 15:22:59+08 | ADD_PARTITION
7990171 | 2022-12-26 15:23:09+08 | ADD_PARTITION
7990172 | 2022-12-26 15:23:20+08 | ADD_PARTITION
7990173 | 2022-12-26 15:23:30+08 | ADD_PARTITION
7990174 | 2022-12-26 15:23:41+08 | ADD_PARTITION
7990175 | 2022-12-26 15:23:51+08 | ADD_PARTITION
7990176 | 2022-12-26 15:24:01+08 | ADD_PARTITION
7990177 | 2022-12-26 15:24:12+08 | ADD_PARTITION
7990178 | 2022-12-26 15:24:23+08 | ADD_PARTITION
7990179 | 2022-12-26 15:24:33+08 | ADD_PARTITION
(10 rows){noformat}
Thanks for [~dengzh]'s help, we found they are not cleaned up due to the
DB-Notification-Cleaner thread dies (HIVE-28808).
> test_drop_managed_kudu_table failed in CreateTable
> --------------------------------------------------
>
> Key: IMPALA-13846
> URL: https://issues.apache.org/jira/browse/IMPALA-13846
> Project: IMPALA
> Issue Type: Bug
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Major
>
> custom_cluster/test_kudu.py::TestKuduHMSIntegration::test_drop_managed_kudu_table
> consistently failed in my local env. The failure is in creating the Kudu
> managed table with HMS integration enabled.
> {code}
> tests/custom_cluster/test_kudu.py:306: in test_drop_managed_kudu_table
> PARTITIONS 3 STORED AS KUDU""" % (unique_database, impala_tbl_name))
> infra/python/env-gcc10.4.0/lib/python2.7/site-packages/impala/hiveserver2.py:343:
> in execute
> self._wait_to_finish() # make execute synchronous
> infra/python/env-gcc10.4.0/lib/python2.7/site-packages/impala/hiveserver2.py:428:
> in _wait_to_finish
> raise OperationalError(resp.errorMessage)
> E OperationalError: Query 3c4eebe048d531d7:97c6ea2f00000000 failed:
> E ImpalaRuntimeException: Error creating Kudu table
> 'test_drop_managed_kudu_table_a82c250c.foo'
> E CAUSED BY: NonRecoverableException: failed to wait for Hive Metastore
> notification log listener to catch up: failed to retrieve notification log
> events: failed to get Hive Metastore next notification: No more data to
> read.{code}
> Catalogd logs show that the error is thrown in KuduClient.createTable() from
> an error in Kudu master (RpcProxy.dispatchMasterError):
> {noformat}
> E0309 09:16:50.324411 13234 JniUtil.java:184]
> de42baedc07418aa:e535aa7000000000] Error in CREATE_TABLE
> test_drop_managed_kudu_table_a82c250c.foo issued by quanlong. Time spent:
> 10s139ms
> I0309 09:16:50.324739 13234 jni-util.cc:321]
> de42baedc07418aa:e535aa7000000000]
> org.apache.impala.common.ImpalaRuntimeException: Error creating Kudu table
> 'test_drop_managed_kudu_table_a82c250c.foo'
> at
> org.apache.impala.service.KuduCatalogOpExecutor.createSynchronizedTable(KuduCatalogOpExecutor.java:157)
> at
> org.apache.impala.service.CatalogOpExecutor.createKuduTable(CatalogOpExecutor.java:3885)
> at
> org.apache.impala.service.CatalogOpExecutor.createTable(CatalogOpExecutor.java:3749)
> at
> org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:491)
> at
> org.apache.impala.service.JniCatalog.lambda$execDdl$3(JniCatalog.java:318)
> at
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
> at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
> at
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
> at
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100)
> at
> org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:247)
> at
> org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:261)
> at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:317)
> Caused by: org.apache.kudu.client.NonRecoverableException: failed to wait for
> Hive Metastore notification log listener to catch up: failed to retrieve
> notification log events: failed to get Hive Metastore next notification: No
> more data to read.
> at
> org.apache.kudu.client.KuduException.transformException(KuduException.java:110)
> at
> org.apache.kudu.client.KuduClient.joinAndHandleException(KuduClient.java:564)
> at org.apache.kudu.client.KuduClient.createTable(KuduClient.java:140)
> at
> org.apache.impala.service.KuduCatalogOpExecutor.createKuduTable(KuduCatalogOpExecutor.java:105)
> at
> org.apache.impala.service.KuduCatalogOpExecutor.createSynchronizedTable(KuduCatalogOpExecutor.java:146)
> ... 11 more
> Suppressed: org.apache.kudu.client.KuduException$OriginalException:
> Original asynchronous stack trace
> at
> org.apache.kudu.client.RpcProxy.dispatchMasterError(RpcProxy.java:414)
> at
> org.apache.kudu.client.RpcProxy.responseReceived(RpcProxy.java:288)
> at
> org.apache.kudu.client.RpcProxy.access$000(RpcProxy.java:64)
> at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:158)
> at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:154)
> at
> org.apache.kudu.client.Connection.channelRead0(Connection.java:377)
> at
> org.apache.kudu.shaded.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
> at
> org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at
> org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at
> org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at
> org.apache.kudu.shaded.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
> at
> org.apache.kudu.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
> at
> org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at
> org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at
> org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at
> org.apache.kudu.shaded.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
> at
> org.apache.kudu.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
> at
> org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at
> org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at
> org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at
> org.apache.kudu.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
> at
> org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
> at
> org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at
> org.apache.kudu.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
> at
> org.apache.kudu.shaded.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
> at
> org.apache.kudu.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
> at
> org.apache.kudu.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
> at
> org.apache.kudu.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
> at
> org.apache.kudu.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
> at
> org.apache.kudu.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
> at
> org.apache.kudu.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750){noformat}
> Looking into the logs of kudu-master under
> logs/cluster/cdh7-node-1/kudu/master, it does show the request and the error:
> {noformat}
> I20250309 09:16:41.136718 10678 catalog_manager.cc:1824] Servicing
> CreateTable request from {username='quanlong'} at 127.0.0.1:41168:
> name: "test_drop_managed_kudu_table_a82c250c.foo"
> schema {
> columns {
> name: "a"
> type: INT32
> is_key: true
> is_nullable: false
> cfile_block_size: 0
> immutable: false
> is_auto_incrementing: false
> }
> }
> partition_schema {
> hash_schema {
> columns {
> name: "a"
> }
> num_buckets: 3
> seed: 0
> }
> range_schema {
> }
> }
> owner: "quanlong"
> I20250309 09:16:41.526204 10676 catalog_manager.cc:5300] T
> e2161695372d4444b77025ca654b5747 P 2bb1c6d7b90f417db34cf73df443b148 reported
> cstate change: term changed from 336 to 337. New cstate: current_term: 337
> leader_uuid: "2bb1c6d7b90f417db34cf73df443b148" committed_config {
> opid_index: -1 OBSOLETE_local: false peers { permanent_uuid:
> "14072394022049af8dbdc51ffd963a96" member_type: VOTER last_known_addr { host:
> "127.0.0.1" port: 31200 } health_report { overall_health: UNKNOWN } } peers {
> permanent_uuid: "e0e1db54dab74f208e37ea1b975595e5" member_type: VOTER
> last_known_addr { host: "127.0.0.1" port: 31202 } health_report {
> overall_health: UNKNOWN } } peers { permanent_uuid:
> "2bb1c6d7b90f417db34cf73df443b148" member_type: VOTER last_known_addr { host:
> "127.0.0.1" port: 31201 } health_report { overall_health: HEALTHY } } }
> I20250309 09:16:41.653339 10680 catalog_manager.cc:5300] T
> b17dd536e95d4719a0b78590b162f383 P 14072394022049af8dbdc51ffd963a96 reported
> cstate change: term changed from 315 to 316. New cstate: current_term: 316
> leader_uuid: "14072394022049af8dbdc51ffd963a96" committed_config {
> opid_index: -1 OBSOLETE_local: false peers { permanent_uuid:
> "e0e1db54dab74f208e37ea1b975595e5" member_type: VOTER last_known_addr { host:
> "127.0.0.1" port: 31202 } health_report { overall_health: UNKNOWN } } peers {
> permanent_uuid: "14072394022049af8dbdc51ffd963a96" member_type: VOTER
> last_known_addr { host: "127.0.0.1" port: 31200 } health_report {
> overall_health: HEALTHY } } peers { permanent_uuid:
> "2bb1c6d7b90f417db34cf73df443b148" member_type: VOTER last_known_addr { host:
> "127.0.0.1" port: 31201 } health_report { overall_health: UNKNOWN } } }
> W20250309 09:16:45.716501 10970 hms_client.cc:367] Time spent get HMS
> notification events: real 4.579s user 0.000s sys 0.000s
> W20250309 09:16:50.320043 10970 hms_client.cc:367] Time spent get HMS
> notification events: real 4.602s user 0.000s sys 0.000s
> W20250309 09:16:50.320114 10970 client.h:208] Call to Hive Metastore failed
> after 1 retries: Network error: failed to get Hive Metastore next
> notification: No more data to read.
> W20250309 09:16:50.320169 10972 hms_notification_log_listener.cc:134] Hive
> Metastore notification log listener poll failed: Network error: failed to
> retrieve notification log events: failed to get Hive Metastore next
> notification: No more data to read.
> I20250309 09:16:50.320242 10678 rpcz_store.cc:269] Call
> kudu.master.MasterService.CreateTable from 127.0.0.1:41168 (request call id
> 2) took 9183ms. Trace:
> I20250309 09:16:50.320261 10678 rpcz_store.cc:270] 0309 09:16:41.136692 (+
> 0us) service_pool.cc:169] Inserting onto call queue
> 0309 09:16:41.136701 (+ 9us) service_pool.cc:228] Handling call
> 0309 09:16:50.320240 (+9183539us) inbound_call.cc:173] Queueing success
> response
> Metrics: {}{noformat}
> Looking into the HMS logs, there are some OOM errors:
> {noformat}
> 2025-03-09T09:16:41,137 INFO [TThreadPoolServer WorkerProcess-214]
> metastore.HiveMetaStore: 157: source:127.0.0.1 get_config_value:
> name=hive.metastore.notifications.add.thrift.objects defaultValue=true
> 2025-03-09T09:16:41,137 INFO [TThreadPoolServer WorkerProcess-214]
> HiveMetaStore.audit: ugi=quanlong ip=127.0.0.1 cmd=source:127.0.0.1
> get_config_value: name=hive.metastore.notifications.add.thrift.objects
> defaultValue=true
> 2025-03-09T09:16:41,138 INFO [TThreadPoolServer WorkerProcess-214]
> metastore.HiveMetaStore: 157: Opening raw store with implementation
> class:org.apache.hadoop.hive.metastore.ObjectStore
> 2025-03-09T09:16:41,138 INFO [TThreadPoolServer WorkerProcess-214]
> metastore.ObjectStore: RawStore:
> org.apache.hadoop.hive.metastore.ObjectStore@5894fa72, with
> PersistenceManager: null will be shutdown
> 2025-03-09T09:16:41,138 INFO [TThreadPoolServer WorkerProcess-214]
> metastore.ObjectStore: RawStore:
> org.apache.hadoop.hive.metastore.ObjectStore@5894fa72, with
> PersistenceManager: org.datanucleus.api.jdo.JDOPersistenceManager@3ffb4518
> created in the thread with id: 1576
> 2025-03-09T09:16:41,139 INFO [TThreadPoolServer WorkerProcess-214]
> metastore.HiveMetaStore: Created RawStore:
> org.apache.hadoop.hive.metastore.ObjectStore@5894fa72 from thread id: 1576
> 2025-03-09T09:16:45,715 ERROR [TThreadPoolServer WorkerProcess-214]
> metastore.RetryingHMSHandler: java.lang.OutOfMemoryError: Java heap
> space{noformat}
> The first OOM error has the stacktrace:
> {noformat}
> 2025-03-08T07:04:58,313 ERROR [TThreadPoolServer WorkerProcess-32]
> metastore.RetryingHMSHandler: java.lang.OutOfMemoryError: Java heap space
> at java.lang.StringCoding.decode(StringCoding.java:215)
> at java.lang.String.<init>(String.java:463)
> at org.postgresql.core.Encoding.decode(Encoding.java:284)
> at org.postgresql.core.Encoding.decode(Encoding.java:295)
> at org.postgresql.jdbc.PgResultSet.getString(PgResultSet.java:2256)
> at
> com.zaxxer.hikari.pool.HikariProxyResultSet.getString(HikariProxyResultSet.java)
> at
> org.datanucleus.store.rdbms.mapping.column.LongVarcharColumnMapping.getString(LongVarcharColumnMapping.java:102)
> at
> org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.getString(SingleFieldMapping.java:188)
> at
> org.datanucleus.store.rdbms.fieldmanager.ResultSetGetter.fetchStringField(ResultSetGetter.java:133)
> at
> org.datanucleus.state.StateManagerImpl.replacingStringField(StateManagerImpl.java:1986)
> at
> org.apache.hadoop.hive.metastore.model.MNotificationLog.dnReplaceField(MNotificationLog.java)
> at
> org.apache.hadoop.hive.metastore.model.MNotificationLog.dnReplaceFields(MNotificationLog.java)
> at
> org.datanucleus.state.StateManagerImpl.replaceFields(StateManagerImpl.java:4352)
> at
> org.datanucleus.store.rdbms.query.PersistentClassROF$1.fetchFields(PersistentClassROF.java:528)
> at
> org.datanucleus.state.StateManagerImpl.loadFieldValues(StateManagerImpl.java:3743)
> at
> org.datanucleus.state.StateManagerImpl.initialiseForHollow(StateManagerImpl.java:383)
> at
> org.datanucleus.state.ObjectProviderFactoryImpl.newForHollow(ObjectProviderFactoryImpl.java:99)
> at
> org.datanucleus.ExecutionContextImpl.findObject(ExecutionContextImpl.java:3199)
> at
> org.datanucleus.store.rdbms.query.PersistentClassROF.findObjectWithIdAndLoadFields(PersistentClassROF.java:523)
> at
> org.datanucleus.store.rdbms.query.PersistentClassROF.getObject(PersistentClassROF.java:456)
> at
> org.datanucleus.store.rdbms.query.ForwardQueryResult.nextResultSetElement(ForwardQueryResult.java:181)
> at
> org.datanucleus.store.rdbms.query.ForwardQueryResult$QueryResultIterator.next(ForwardQueryResult.java:409)
> at
> org.datanucleus.store.rdbms.query.ForwardQueryResult.processNumberOfResults(ForwardQueryResult.java:137)
> at
> org.datanucleus.store.rdbms.query.ForwardQueryResult.advanceToEndOfResultSet(ForwardQueryResult.java:165)
> at
> org.datanucleus.store.rdbms.query.ForwardQueryResult.closingConnection(ForwardQueryResult.java:291)
> at
> org.datanucleus.store.query.AbstractQueryResult.disconnect(AbstractQueryResult.java:105)
> at
> org.datanucleus.store.rdbms.query.AbstractRDBMSQueryResult.disconnect(AbstractRDBMSQueryResult.java:282)
> at
> org.datanucleus.store.rdbms.query.JDOQLQuery$2.transactionPreClose(JDOQLQuery.java:725)
> at
> org.datanucleus.store.connection.AbstractManagedConnection.transactionPreClose(AbstractManagedConnection.java:117)
> at
> org.datanucleus.store.connection.ConnectionManagerImpl$2.transactionPreCommit(ConnectionManagerImpl.java:514)
> at
> org.datanucleus.TransactionImpl.internalPreCommit(TransactionImpl.java:397)
> at
> org.datanucleus.TransactionImpl.commit(TransactionImpl.java:287){noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]