Quanlong Huang created IMPALA-13846:
---------------------------------------

             Summary: test_drop_managed_kudu_table failed in CreateTable
                 Key: IMPALA-13846
                 URL: https://issues.apache.org/jira/browse/IMPALA-13846
             Project: IMPALA
          Issue Type: Bug
            Reporter: Quanlong Huang
            Assignee: Quanlong Huang


custom_cluster/test_kudu.py::TestKuduHMSIntegration::test_drop_managed_kudu_table
 consistently failed in my local env. The failure is in creating the Kudu 
managed table with HMS integration enabled.
{code}
tests/custom_cluster/test_kudu.py:306: in test_drop_managed_kudu_table
    PARTITIONS 3 STORED AS KUDU""" % (unique_database, impala_tbl_name))
infra/python/env-gcc10.4.0/lib/python2.7/site-packages/impala/hiveserver2.py:343:
 in execute
    self._wait_to_finish()  # make execute synchronous
infra/python/env-gcc10.4.0/lib/python2.7/site-packages/impala/hiveserver2.py:428:
 in _wait_to_finish
    raise OperationalError(resp.errorMessage)
E   OperationalError: Query 3c4eebe048d531d7:97c6ea2f00000000 failed:
E   ImpalaRuntimeException: Error creating Kudu table 
'test_drop_managed_kudu_table_a82c250c.foo'
E   CAUSED BY: NonRecoverableException: failed to wait for Hive Metastore 
notification log listener to catch up: failed to retrieve notification log 
events: failed to get Hive Metastore next notification: No more data to 
read.{code}

Catalogd logs show that the error is thrown in KuduClient.createTable() from an 
error in Kudu master (RpcProxy.dispatchMasterError):
{noformat}
E0309 09:16:50.324411 13234 JniUtil.java:184] 
de42baedc07418aa:e535aa7000000000] Error in CREATE_TABLE 
test_drop_managed_kudu_table_a82c250c.foo issued by quanlong. Time spent: 
10s139ms
I0309 09:16:50.324739 13234 jni-util.cc:321] de42baedc07418aa:e535aa7000000000] 
org.apache.impala.common.ImpalaRuntimeException: Error creating Kudu table 
'test_drop_managed_kudu_table_a82c250c.foo'
        at 
org.apache.impala.service.KuduCatalogOpExecutor.createSynchronizedTable(KuduCatalogOpExecutor.java:157)
        at 
org.apache.impala.service.CatalogOpExecutor.createKuduTable(CatalogOpExecutor.java:3885)
        at 
org.apache.impala.service.CatalogOpExecutor.createTable(CatalogOpExecutor.java:3749)
        at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:491)
        at 
org.apache.impala.service.JniCatalog.lambda$execDdl$3(JniCatalog.java:318)
        at 
org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
        at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
        at 
org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
        at 
org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100)
        at 
org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:247)
        at 
org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:261)
        at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:317)
Caused by: org.apache.kudu.client.NonRecoverableException: failed to wait for 
Hive Metastore notification log listener to catch up: failed to retrieve 
notification log events: failed to get Hive Metastore next notification: No 
more data to read.
        at 
org.apache.kudu.client.KuduException.transformException(KuduException.java:110)
        at 
org.apache.kudu.client.KuduClient.joinAndHandleException(KuduClient.java:564)
        at org.apache.kudu.client.KuduClient.createTable(KuduClient.java:140)
        at 
org.apache.impala.service.KuduCatalogOpExecutor.createKuduTable(KuduCatalogOpExecutor.java:105)
        at 
org.apache.impala.service.KuduCatalogOpExecutor.createSynchronizedTable(KuduCatalogOpExecutor.java:146)
        ... 11 more 
        Suppressed: org.apache.kudu.client.KuduException$OriginalException: 
Original asynchronous stack trace
                at 
org.apache.kudu.client.RpcProxy.dispatchMasterError(RpcProxy.java:414)
                at 
org.apache.kudu.client.RpcProxy.responseReceived(RpcProxy.java:288)
                at org.apache.kudu.client.RpcProxy.access$000(RpcProxy.java:64)
                at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:158)
                at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:154)
                at 
org.apache.kudu.client.Connection.channelRead0(Connection.java:377)
                at 
org.apache.kudu.shaded.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
                at 
org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
                at 
org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
                at 
org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
                at 
org.apache.kudu.shaded.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
                at 
org.apache.kudu.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
                at 
org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
                at 
org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
                at 
org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
                at 
org.apache.kudu.shaded.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
                at 
org.apache.kudu.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
                at 
org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
                at 
org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
                at 
org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
                at 
org.apache.kudu.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
                at 
org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
                at 
org.apache.kudu.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
                at 
org.apache.kudu.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
                at 
org.apache.kudu.shaded.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
                at 
org.apache.kudu.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
                at 
org.apache.kudu.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
                at 
org.apache.kudu.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
                at 
org.apache.kudu.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
                at 
org.apache.kudu.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
                at 
org.apache.kudu.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
                at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                at java.lang.Thread.run(Thread.java:750){noformat}

Looking into the logs of kudu-master under 
logs/cluster/cdh7-node-1/kudu/master, it does show the request and the error:
{noformat}
I20250309 09:16:41.136718 10678 catalog_manager.cc:1824] Servicing CreateTable 
request from {username='quanlong'} at 127.0.0.1:41168:
name: "test_drop_managed_kudu_table_a82c250c.foo"
schema {
  columns {
    name: "a" 
    type: INT32
    is_key: true
    is_nullable: false
    cfile_block_size: 0
    immutable: false
    is_auto_incrementing: false
  }
}
partition_schema {
  hash_schema {
    columns {
      name: "a" 
    }   
    num_buckets: 3
    seed: 0
  }
  range_schema {
  }
}
owner: "quanlong"
I20250309 09:16:41.526204 10676 catalog_manager.cc:5300] T 
e2161695372d4444b77025ca654b5747 P 2bb1c6d7b90f417db34cf73df443b148 reported 
cstate change: term changed from 336 to 337. New cstate: current_term: 337 
leader_uuid: "2bb1c6d7b90f417db34cf73df443b148" committed_config { opid_index: 
-1 OBSOLETE_local: false peers { permanent_uuid: 
"14072394022049af8dbdc51ffd963a96" member_type: VOTER last_known_addr { host: 
"127.0.0.1" port: 31200 } health_report { overall_health: UNKNOWN } } peers { 
permanent_uuid: "e0e1db54dab74f208e37ea1b975595e5" member_type: VOTER 
last_known_addr { host: "127.0.0.1" port: 31202 } health_report { 
overall_health: UNKNOWN } } peers { permanent_uuid: 
"2bb1c6d7b90f417db34cf73df443b148" member_type: VOTER last_known_addr { host: 
"127.0.0.1" port: 31201 } health_report { overall_health: HEALTHY } } }
I20250309 09:16:41.653339 10680 catalog_manager.cc:5300] T 
b17dd536e95d4719a0b78590b162f383 P 14072394022049af8dbdc51ffd963a96 reported 
cstate change: term changed from 315 to 316. New cstate: current_term: 316 
leader_uuid: "14072394022049af8dbdc51ffd963a96" committed_config { opid_index: 
-1 OBSOLETE_local: false peers { permanent_uuid: 
"e0e1db54dab74f208e37ea1b975595e5" member_type: VOTER last_known_addr { host: 
"127.0.0.1" port: 31202 } health_report { overall_health: UNKNOWN } } peers { 
permanent_uuid: "14072394022049af8dbdc51ffd963a96" member_type: VOTER 
last_known_addr { host: "127.0.0.1" port: 31200 } health_report { 
overall_health: HEALTHY } } peers { permanent_uuid: 
"2bb1c6d7b90f417db34cf73df443b148" member_type: VOTER last_known_addr { host: 
"127.0.0.1" port: 31201 } health_report { overall_health: UNKNOWN } } }
W20250309 09:16:45.716501 10970 hms_client.cc:367] Time spent get HMS 
notification events: real 4.579s  user 0.000s     sys 0.000s
W20250309 09:16:50.320043 10970 hms_client.cc:367] Time spent get HMS 
notification events: real 4.602s  user 0.000s     sys 0.000s
W20250309 09:16:50.320114 10970 client.h:208] Call to Hive Metastore failed 
after 1 retries: Network error: failed to get Hive Metastore next notification: 
No more data to read.
W20250309 09:16:50.320169 10972 hms_notification_log_listener.cc:134] Hive 
Metastore notification log listener poll failed: Network error: failed to 
retrieve notification log events: failed to get Hive Metastore next 
notification: No more data to read.
I20250309 09:16:50.320242 10678 rpcz_store.cc:269] Call 
kudu.master.MasterService.CreateTable from 127.0.0.1:41168 (request call id 2) 
took 9183ms. Trace:
I20250309 09:16:50.320261 10678 rpcz_store.cc:270] 0309 09:16:41.136692 (+     
0us) service_pool.cc:169] Inserting onto call queue
0309 09:16:41.136701 (+     9us) service_pool.cc:228] Handling call
0309 09:16:50.320240 (+9183539us) inbound_call.cc:173] Queueing success response
Metrics: {}{noformat}

Looking into the HMS logs, there are some OOM errors:
{noformat}
2025-03-09T09:16:41,137  INFO [TThreadPoolServer WorkerProcess-214] 
metastore.HiveMetaStore: 157: source:127.0.0.1 get_config_value: 
name=hive.metastore.notifications.add.thrift.objects defaultValue=true
2025-03-09T09:16:41,137  INFO [TThreadPoolServer WorkerProcess-214] 
HiveMetaStore.audit: ugi=quanlong   ip=127.0.0.1    cmd=source:127.0.0.1 
get_config_value: name=hive.metastore.notifications.add.thrift.objects 
defaultValue=true
2025-03-09T09:16:41,138  INFO [TThreadPoolServer WorkerProcess-214] 
metastore.HiveMetaStore: 157: Opening raw store with implementation 
class:org.apache.hadoop.hive.metastore.ObjectStore
2025-03-09T09:16:41,138  INFO [TThreadPoolServer WorkerProcess-214] 
metastore.ObjectStore: RawStore: 
org.apache.hadoop.hive.metastore.ObjectStore@5894fa72, with PersistenceManager: 
null will be shutdown
2025-03-09T09:16:41,138  INFO [TThreadPoolServer WorkerProcess-214] 
metastore.ObjectStore: RawStore: 
org.apache.hadoop.hive.metastore.ObjectStore@5894fa72, with PersistenceManager: 
org.datanucleus.api.jdo.JDOPersistenceManager@3ffb4518 created in the thread 
with id: 1576
2025-03-09T09:16:41,139  INFO [TThreadPoolServer WorkerProcess-214] 
metastore.HiveMetaStore: Created RawStore: 
org.apache.hadoop.hive.metastore.ObjectStore@5894fa72 from thread id: 1576
2025-03-09T09:16:45,715 ERROR [TThreadPoolServer WorkerProcess-214] 
metastore.RetryingHMSHandler: java.lang.OutOfMemoryError: Java heap 
space{noformat}
The first OOM error has the stacktrace:
{noformat}
2025-03-08T07:04:58,313 ERROR [TThreadPoolServer WorkerProcess-32] 
metastore.RetryingHMSHandler: java.lang.OutOfMemoryError: Java heap space
        at java.lang.StringCoding.decode(StringCoding.java:215)
        at java.lang.String.<init>(String.java:463)
        at org.postgresql.core.Encoding.decode(Encoding.java:284)
        at org.postgresql.core.Encoding.decode(Encoding.java:295)
        at org.postgresql.jdbc.PgResultSet.getString(PgResultSet.java:2256)
        at 
com.zaxxer.hikari.pool.HikariProxyResultSet.getString(HikariProxyResultSet.java)
        at 
org.datanucleus.store.rdbms.mapping.column.LongVarcharColumnMapping.getString(LongVarcharColumnMapping.java:102)
        at 
org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.getString(SingleFieldMapping.java:188)
        at 
org.datanucleus.store.rdbms.fieldmanager.ResultSetGetter.fetchStringField(ResultSetGetter.java:133)
        at 
org.datanucleus.state.StateManagerImpl.replacingStringField(StateManagerImpl.java:1986)
        at 
org.apache.hadoop.hive.metastore.model.MNotificationLog.dnReplaceField(MNotificationLog.java)
        at 
org.apache.hadoop.hive.metastore.model.MNotificationLog.dnReplaceFields(MNotificationLog.java)
        at 
org.datanucleus.state.StateManagerImpl.replaceFields(StateManagerImpl.java:4352)
        at 
org.datanucleus.store.rdbms.query.PersistentClassROF$1.fetchFields(PersistentClassROF.java:528)
        at 
org.datanucleus.state.StateManagerImpl.loadFieldValues(StateManagerImpl.java:3743)
        at 
org.datanucleus.state.StateManagerImpl.initialiseForHollow(StateManagerImpl.java:383)
        at 
org.datanucleus.state.ObjectProviderFactoryImpl.newForHollow(ObjectProviderFactoryImpl.java:99)
        at 
org.datanucleus.ExecutionContextImpl.findObject(ExecutionContextImpl.java:3199)
        at 
org.datanucleus.store.rdbms.query.PersistentClassROF.findObjectWithIdAndLoadFields(PersistentClassROF.java:523)
        at 
org.datanucleus.store.rdbms.query.PersistentClassROF.getObject(PersistentClassROF.java:456)
        at 
org.datanucleus.store.rdbms.query.ForwardQueryResult.nextResultSetElement(ForwardQueryResult.java:181)
        at 
org.datanucleus.store.rdbms.query.ForwardQueryResult$QueryResultIterator.next(ForwardQueryResult.java:409)
        at 
org.datanucleus.store.rdbms.query.ForwardQueryResult.processNumberOfResults(ForwardQueryResult.java:137)
        at 
org.datanucleus.store.rdbms.query.ForwardQueryResult.advanceToEndOfResultSet(ForwardQueryResult.java:165)
        at 
org.datanucleus.store.rdbms.query.ForwardQueryResult.closingConnection(ForwardQueryResult.java:291)
        at 
org.datanucleus.store.query.AbstractQueryResult.disconnect(AbstractQueryResult.java:105)
        at 
org.datanucleus.store.rdbms.query.AbstractRDBMSQueryResult.disconnect(AbstractRDBMSQueryResult.java:282)
        at 
org.datanucleus.store.rdbms.query.JDOQLQuery$2.transactionPreClose(JDOQLQuery.java:725)
        at 
org.datanucleus.store.connection.AbstractManagedConnection.transactionPreClose(AbstractManagedConnection.java:117)
        at 
org.datanucleus.store.connection.ConnectionManagerImpl$2.transactionPreCommit(ConnectionManagerImpl.java:514)
        at 
org.datanucleus.TransactionImpl.internalPreCommit(TransactionImpl.java:397)
        at 
org.datanucleus.TransactionImpl.commit(TransactionImpl.java:287){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to