[ https://issues.apache.org/jira/browse/KAFKA-17766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17889298#comment-17889298 ]
Anshul Goyal commented on KAFKA-17766: -------------------------------------- [~satish.duggana] [~ckamal] Could you have a look at my approach here : [https://github.com/apache/kafka/pull/17492] Thanks In Advance. > TopicBasedRemoteLogMetadataManager stuck in close > ------------------------------------------------- > > Key: KAFKA-17766 > URL: https://issues.apache.org/jira/browse/KAFKA-17766 > Project: Kafka > Issue Type: Bug > Components: build, Tiered-Storage > Reporter: David Arthur > Assignee: Anshul Goyal > Priority: Major > Attachments: GradleWorkerMain-7952.txt > > > During a CI run, there was a timed out build due to this class stuck in its > close method. > > {code:java} > "Test worker" #1 prio=5 os_prio=0 cpu=9155.23ms elapsed=9615.57s > tid=0x00007fcc80029800 nid=0x1f12 in Object.wait() [0x00007fcc853f9000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(java.base@11.0.24/Native Method) > - waiting on <no object reference available> > at java.lang.Thread.join(java.base@11.0.24/Thread.java:1300) > - waiting to re-lock in wait() <0x000000008189e9f8> (a > org.apache.kafka.common.utils.KafkaThread) > at java.lang.Thread.join(java.base@11.0.24/Thread.java:1375) > at > org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.close(TopicBasedRemoteLogMetadataManager.java:575) > {code} > > {code:java} > "RLMMInitializationThread" #9511 prio=5 os_prio=0 cpu=1.40ms elapsed=9222.98s > tid=0x00007fcc8196f800 nid=0x12ef2 waiting on condition [0x00007fcbe05fe000] > java.lang.Thread.State: WAITING (parking) > at jdk.internal.misc.Unsafe.park(java.base@11.0.24/Native Method) > - parking to wait for <0x0000000081e364c0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at > java.util.concurrent.locks.LockSupport.park(java.base@11.0.24/LockSupport.java:194) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.24/AbstractQueuedSynchronizer.java:885) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.base@11.0.24/AbstractQueuedSynchronizer.java:917) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@11.0.24/AbstractQueuedSynchronizer.java:1240) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(java.base@11.0.24/ReentrantReadWriteLock.java:959) > at > org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.initializeResources(TopicBasedRemoteLogMetadataManager.java:432) > at > org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager$$Lambda$2007/0x0000000100934c40.run(Unknown > Source) > at java.lang.Thread.run(java.base@11.0.24/Thread.java:829) {code} > > It seems we are joining the initialization thread assuming that it has (or > will) complete. This appears to be a lock race between the close method and > the initialization thread which results in a dead lock. -- This message was sent by Atlassian Jira (v8.20.10#820010)