Quanlong Huang created IMPALA-13846:
---------------------------------------
Summary: test_drop_managed_kudu_table failed in CreateTable
Key: IMPALA-13846
URL: https://issues.apache.org/jira/browse/IMPALA-13846
Project: IMPALA
Issue Type: Bug
Reporter: Quanlong Huang
Assignee: Quanlong Huang
custom_cluster/test_kudu.py::TestKuduHMSIntegration::test_drop_managed_kudu_table
consistently failed in my local env. The failure is in creating the Kudu
managed table with HMS integration enabled.
{code}
tests/custom_cluster/test_kudu.py:306: in test_drop_managed_kudu_table
PARTITIONS 3 STORED AS KUDU""" % (unique_database, impala_tbl_name))
infra/python/env-gcc10.4.0/lib/python2.7/site-packages/impala/hiveserver2.py:343:
in execute
self._wait_to_finish() # make execute synchronous
infra/python/env-gcc10.4.0/lib/python2.7/site-packages/impala/hiveserver2.py:428:
in _wait_to_finish
raise OperationalError(resp.errorMessage)
E OperationalError: Query 3c4eebe048d531d7:97c6ea2f00000000 failed:
E ImpalaRuntimeException: Error creating Kudu table
'test_drop_managed_kudu_table_a82c250c.foo'
E CAUSED BY: NonRecoverableException: failed to wait for Hive Metastore
notification log listener to catch up: failed to retrieve notification log
events: failed to get Hive Metastore next notification: No more data to
read.{code}
Catalogd logs show that the error is thrown in KuduClient.createTable() from an
error in Kudu master (RpcProxy.dispatchMasterError):
{noformat}
E0309 09:16:50.324411 13234 JniUtil.java:184]
de42baedc07418aa:e535aa7000000000] Error in CREATE_TABLE
test_drop_managed_kudu_table_a82c250c.foo issued by quanlong. Time spent:
10s139ms
I0309 09:16:50.324739 13234 jni-util.cc:321] de42baedc07418aa:e535aa7000000000]
org.apache.impala.common.ImpalaRuntimeException: Error creating Kudu table
'test_drop_managed_kudu_table_a82c250c.foo'
at
org.apache.impala.service.KuduCatalogOpExecutor.createSynchronizedTable(KuduCatalogOpExecutor.java:157)
at
org.apache.impala.service.CatalogOpExecutor.createKuduTable(CatalogOpExecutor.java:3885)
at
org.apache.impala.service.CatalogOpExecutor.createTable(CatalogOpExecutor.java:3749)
at
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:491)
at
org.apache.impala.service.JniCatalog.lambda$execDdl$3(JniCatalog.java:318)
at
org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
at
org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
at
org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100)
at
org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:247)
at
org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:261)
at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:317)
Caused by: org.apache.kudu.client.NonRecoverableException: failed to wait for
Hive Metastore notification log listener to catch up: failed to retrieve
notification log events: failed to get Hive Metastore next notification: No
more data to read.
at
org.apache.kudu.client.KuduException.transformException(KuduException.java:110)
at
org.apache.kudu.client.KuduClient.joinAndHandleException(KuduClient.java:564)
at org.apache.kudu.client.KuduClient.createTable(KuduClient.java:140)
at
org.apache.impala.service.KuduCatalogOpExecutor.createKuduTable(KuduCatalogOpExecutor.java:105)
at
org.apache.impala.service.KuduCatalogOpExecutor.createSynchronizedTable(KuduCatalogOpExecutor.java:146)
... 11 more
Suppressed: org.apache.kudu.client.KuduException$OriginalException:
Original asynchronous stack trace
at
org.apache.kudu.client.RpcProxy.dispatchMasterError(RpcProxy.java:414)
at
org.apache.kudu.client.RpcProxy.responseReceived(RpcProxy.java:288)
at org.apache.kudu.client.RpcProxy.access$000(RpcProxy.java:64)
at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:158)
at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:154)
at
org.apache.kudu.client.Connection.channelRead0(Connection.java:377)
at
org.apache.kudu.shaded.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at
org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at
org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at
org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at
org.apache.kudu.shaded.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
at
org.apache.kudu.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
at
org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at
org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at
org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at
org.apache.kudu.shaded.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
at
org.apache.kudu.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
at
org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at
org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at
org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at
org.apache.kudu.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at
org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at
org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at
org.apache.kudu.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at
org.apache.kudu.shaded.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at
org.apache.kudu.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at
org.apache.kudu.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
at
org.apache.kudu.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
at
org.apache.kudu.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
at
org.apache.kudu.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at
org.apache.kudu.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750){noformat}
Looking into the logs of kudu-master under
logs/cluster/cdh7-node-1/kudu/master, it does show the request and the error:
{noformat}
I20250309 09:16:41.136718 10678 catalog_manager.cc:1824] Servicing CreateTable
request from {username='quanlong'} at 127.0.0.1:41168:
name: "test_drop_managed_kudu_table_a82c250c.foo"
schema {
columns {
name: "a"
type: INT32
is_key: true
is_nullable: false
cfile_block_size: 0
immutable: false
is_auto_incrementing: false
}
}
partition_schema {
hash_schema {
columns {
name: "a"
}
num_buckets: 3
seed: 0
}
range_schema {
}
}
owner: "quanlong"
I20250309 09:16:41.526204 10676 catalog_manager.cc:5300] T
e2161695372d4444b77025ca654b5747 P 2bb1c6d7b90f417db34cf73df443b148 reported
cstate change: term changed from 336 to 337. New cstate: current_term: 337
leader_uuid: "2bb1c6d7b90f417db34cf73df443b148" committed_config { opid_index:
-1 OBSOLETE_local: false peers { permanent_uuid:
"14072394022049af8dbdc51ffd963a96" member_type: VOTER last_known_addr { host:
"127.0.0.1" port: 31200 } health_report { overall_health: UNKNOWN } } peers {
permanent_uuid: "e0e1db54dab74f208e37ea1b975595e5" member_type: VOTER
last_known_addr { host: "127.0.0.1" port: 31202 } health_report {
overall_health: UNKNOWN } } peers { permanent_uuid:
"2bb1c6d7b90f417db34cf73df443b148" member_type: VOTER last_known_addr { host:
"127.0.0.1" port: 31201 } health_report { overall_health: HEALTHY } } }
I20250309 09:16:41.653339 10680 catalog_manager.cc:5300] T
b17dd536e95d4719a0b78590b162f383 P 14072394022049af8dbdc51ffd963a96 reported
cstate change: term changed from 315 to 316. New cstate: current_term: 316
leader_uuid: "14072394022049af8dbdc51ffd963a96" committed_config { opid_index:
-1 OBSOLETE_local: false peers { permanent_uuid:
"e0e1db54dab74f208e37ea1b975595e5" member_type: VOTER last_known_addr { host:
"127.0.0.1" port: 31202 } health_report { overall_health: UNKNOWN } } peers {
permanent_uuid: "14072394022049af8dbdc51ffd963a96" member_type: VOTER
last_known_addr { host: "127.0.0.1" port: 31200 } health_report {
overall_health: HEALTHY } } peers { permanent_uuid:
"2bb1c6d7b90f417db34cf73df443b148" member_type: VOTER last_known_addr { host:
"127.0.0.1" port: 31201 } health_report { overall_health: UNKNOWN } } }
W20250309 09:16:45.716501 10970 hms_client.cc:367] Time spent get HMS
notification events: real 4.579s user 0.000s sys 0.000s
W20250309 09:16:50.320043 10970 hms_client.cc:367] Time spent get HMS
notification events: real 4.602s user 0.000s sys 0.000s
W20250309 09:16:50.320114 10970 client.h:208] Call to Hive Metastore failed
after 1 retries: Network error: failed to get Hive Metastore next notification:
No more data to read.
W20250309 09:16:50.320169 10972 hms_notification_log_listener.cc:134] Hive
Metastore notification log listener poll failed: Network error: failed to
retrieve notification log events: failed to get Hive Metastore next
notification: No more data to read.
I20250309 09:16:50.320242 10678 rpcz_store.cc:269] Call
kudu.master.MasterService.CreateTable from 127.0.0.1:41168 (request call id 2)
took 9183ms. Trace:
I20250309 09:16:50.320261 10678 rpcz_store.cc:270] 0309 09:16:41.136692 (+
0us) service_pool.cc:169] Inserting onto call queue
0309 09:16:41.136701 (+ 9us) service_pool.cc:228] Handling call
0309 09:16:50.320240 (+9183539us) inbound_call.cc:173] Queueing success response
Metrics: {}{noformat}
Looking into the HMS logs, there are some OOM errors:
{noformat}
2025-03-09T09:16:41,137 INFO [TThreadPoolServer WorkerProcess-214]
metastore.HiveMetaStore: 157: source:127.0.0.1 get_config_value:
name=hive.metastore.notifications.add.thrift.objects defaultValue=true
2025-03-09T09:16:41,137 INFO [TThreadPoolServer WorkerProcess-214]
HiveMetaStore.audit: ugi=quanlong ip=127.0.0.1 cmd=source:127.0.0.1
get_config_value: name=hive.metastore.notifications.add.thrift.objects
defaultValue=true
2025-03-09T09:16:41,138 INFO [TThreadPoolServer WorkerProcess-214]
metastore.HiveMetaStore: 157: Opening raw store with implementation
class:org.apache.hadoop.hive.metastore.ObjectStore
2025-03-09T09:16:41,138 INFO [TThreadPoolServer WorkerProcess-214]
metastore.ObjectStore: RawStore:
org.apache.hadoop.hive.metastore.ObjectStore@5894fa72, with PersistenceManager:
null will be shutdown
2025-03-09T09:16:41,138 INFO [TThreadPoolServer WorkerProcess-214]
metastore.ObjectStore: RawStore:
org.apache.hadoop.hive.metastore.ObjectStore@5894fa72, with PersistenceManager:
org.datanucleus.api.jdo.JDOPersistenceManager@3ffb4518 created in the thread
with id: 1576
2025-03-09T09:16:41,139 INFO [TThreadPoolServer WorkerProcess-214]
metastore.HiveMetaStore: Created RawStore:
org.apache.hadoop.hive.metastore.ObjectStore@5894fa72 from thread id: 1576
2025-03-09T09:16:45,715 ERROR [TThreadPoolServer WorkerProcess-214]
metastore.RetryingHMSHandler: java.lang.OutOfMemoryError: Java heap
space{noformat}
The first OOM error has the stacktrace:
{noformat}
2025-03-08T07:04:58,313 ERROR [TThreadPoolServer WorkerProcess-32]
metastore.RetryingHMSHandler: java.lang.OutOfMemoryError: Java heap space
at java.lang.StringCoding.decode(StringCoding.java:215)
at java.lang.String.<init>(String.java:463)
at org.postgresql.core.Encoding.decode(Encoding.java:284)
at org.postgresql.core.Encoding.decode(Encoding.java:295)
at org.postgresql.jdbc.PgResultSet.getString(PgResultSet.java:2256)
at
com.zaxxer.hikari.pool.HikariProxyResultSet.getString(HikariProxyResultSet.java)
at
org.datanucleus.store.rdbms.mapping.column.LongVarcharColumnMapping.getString(LongVarcharColumnMapping.java:102)
at
org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.getString(SingleFieldMapping.java:188)
at
org.datanucleus.store.rdbms.fieldmanager.ResultSetGetter.fetchStringField(ResultSetGetter.java:133)
at
org.datanucleus.state.StateManagerImpl.replacingStringField(StateManagerImpl.java:1986)
at
org.apache.hadoop.hive.metastore.model.MNotificationLog.dnReplaceField(MNotificationLog.java)
at
org.apache.hadoop.hive.metastore.model.MNotificationLog.dnReplaceFields(MNotificationLog.java)
at
org.datanucleus.state.StateManagerImpl.replaceFields(StateManagerImpl.java:4352)
at
org.datanucleus.store.rdbms.query.PersistentClassROF$1.fetchFields(PersistentClassROF.java:528)
at
org.datanucleus.state.StateManagerImpl.loadFieldValues(StateManagerImpl.java:3743)
at
org.datanucleus.state.StateManagerImpl.initialiseForHollow(StateManagerImpl.java:383)
at
org.datanucleus.state.ObjectProviderFactoryImpl.newForHollow(ObjectProviderFactoryImpl.java:99)
at
org.datanucleus.ExecutionContextImpl.findObject(ExecutionContextImpl.java:3199)
at
org.datanucleus.store.rdbms.query.PersistentClassROF.findObjectWithIdAndLoadFields(PersistentClassROF.java:523)
at
org.datanucleus.store.rdbms.query.PersistentClassROF.getObject(PersistentClassROF.java:456)
at
org.datanucleus.store.rdbms.query.ForwardQueryResult.nextResultSetElement(ForwardQueryResult.java:181)
at
org.datanucleus.store.rdbms.query.ForwardQueryResult$QueryResultIterator.next(ForwardQueryResult.java:409)
at
org.datanucleus.store.rdbms.query.ForwardQueryResult.processNumberOfResults(ForwardQueryResult.java:137)
at
org.datanucleus.store.rdbms.query.ForwardQueryResult.advanceToEndOfResultSet(ForwardQueryResult.java:165)
at
org.datanucleus.store.rdbms.query.ForwardQueryResult.closingConnection(ForwardQueryResult.java:291)
at
org.datanucleus.store.query.AbstractQueryResult.disconnect(AbstractQueryResult.java:105)
at
org.datanucleus.store.rdbms.query.AbstractRDBMSQueryResult.disconnect(AbstractRDBMSQueryResult.java:282)
at
org.datanucleus.store.rdbms.query.JDOQLQuery$2.transactionPreClose(JDOQLQuery.java:725)
at
org.datanucleus.store.connection.AbstractManagedConnection.transactionPreClose(AbstractManagedConnection.java:117)
at
org.datanucleus.store.connection.ConnectionManagerImpl$2.transactionPreCommit(ConnectionManagerImpl.java:514)
at
org.datanucleus.TransactionImpl.internalPreCommit(TransactionImpl.java:397)
at
org.datanucleus.TransactionImpl.commit(TransactionImpl.java:287){noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]