[ 
https://issues.apache.org/jira/browse/KAFKA-17766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17889298#comment-17889298
 ] 

Anshul Goyal commented on KAFKA-17766:
--------------------------------------

[~satish.duggana]  [~ckamal] Could you have a look at my approach here : 
[https://github.com/apache/kafka/pull/17492]

 

Thanks In Advance. 

> TopicBasedRemoteLogMetadataManager stuck in close
> -------------------------------------------------
>
>                 Key: KAFKA-17766
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17766
>             Project: Kafka
>          Issue Type: Bug
>          Components: build, Tiered-Storage
>            Reporter: David Arthur
>            Assignee: Anshul Goyal
>            Priority: Major
>         Attachments: GradleWorkerMain-7952.txt
>
>
> During a CI run, there was a timed out build due to this class stuck in its 
> close method.
>  
> {code:java}
> "Test worker" #1 prio=5 os_prio=0 cpu=9155.23ms elapsed=9615.57s 
> tid=0x00007fcc80029800 nid=0x1f12 in Object.wait()  [0x00007fcc853f9000]
>    java.lang.Thread.State: WAITING (on object monitor)
>     at java.lang.Object.wait(java.base@11.0.24/Native Method)
>     - waiting on <no object reference available>
>     at java.lang.Thread.join(java.base@11.0.24/Thread.java:1300)
>     - waiting to re-lock in wait() <0x000000008189e9f8> (a 
> org.apache.kafka.common.utils.KafkaThread)
>     at java.lang.Thread.join(java.base@11.0.24/Thread.java:1375)
>     at 
> org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.close(TopicBasedRemoteLogMetadataManager.java:575)
>  {code}
>  
> {code:java}
> "RLMMInitializationThread" #9511 prio=5 os_prio=0 cpu=1.40ms elapsed=9222.98s 
> tid=0x00007fcc8196f800 nid=0x12ef2 waiting on condition  [0x00007fcbe05fe000]
>    java.lang.Thread.State: WAITING (parking)
>     at jdk.internal.misc.Unsafe.park(java.base@11.0.24/Native Method)
>     - parking to wait for  <0x0000000081e364c0> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>     at 
> java.util.concurrent.locks.LockSupport.park(java.base@11.0.24/LockSupport.java:194)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.24/AbstractQueuedSynchronizer.java:885)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.base@11.0.24/AbstractQueuedSynchronizer.java:917)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@11.0.24/AbstractQueuedSynchronizer.java:1240)
>     at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(java.base@11.0.24/ReentrantReadWriteLock.java:959)
>     at 
> org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.initializeResources(TopicBasedRemoteLogMetadataManager.java:432)
>     at 
> org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager$$Lambda$2007/0x0000000100934c40.run(Unknown
>  Source)
>     at java.lang.Thread.run(java.base@11.0.24/Thread.java:829) {code}
>  
> It seems we are joining the initialization thread assuming that it has (or 
> will) complete. This appears to be a lock race between the close method and 
> the initialization thread which results in a dead lock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to