[ https://issues.apache.org/jira/browse/KAFKA-19629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18015527#comment-18015527 ]
Matthias J. Sax commented on KAFKA-19629: ----------------------------------------- Thanks for filing this ticket. Seems the changes of KAFKA-15770 introduced this issue. Not 100% sure yet, what the right fix is, but not allocating locks in the same order on all code path is for sure incorrect. We do lock `Position` object inside `StoreQueryUtils#handleBasicQueries(...)` – maybe we would need to lock the passed in `store`, first? > Deadlock in Kafka Streams when processing Interactive Queries and state store > updates concurrently > -------------------------------------------------------------------------------------------------- > > Key: KAFKA-19629 > URL: https://issues.apache.org/jira/browse/KAFKA-19629 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 3.8.0 > Environment: Kafka Streams, kotlin, linux, docker. JDK 21 > Reporter: Evgheni Popusoi > Priority: Major > Attachments: thread-dump-1.txt, thread-dump-2.txt > > > We are using a Kafka Streams topology that continuously writes large volumes > of data into a RocksDB state store with stable throughput. In parallel, > another thread executes Interactive Query (IQ) requests against the same > local state store. > When the number of IQ requests in the queue grows (≈50+), the application > enters a {*}deadlock state{*}. > *Investigation:* > Using a thread dump, we discovered a lock inversion between RocksDB > operations: > * {{RocksDBStore.put}} > ** blocked on {{org.apache.kafka.streams.query.Position@4ba00b6c}} > ** holding {{org.apache.kafka.streams.state.internals.RocksDBStore@414cff0e}} > * {{RocksDBStore.range}} > ** blocked on > {{org.apache.kafka.streams.state.internals.RocksDBStore@414cff0e}} > ** holding {{org.apache.kafka.streams.query.Position@4ba00b6c}} > This indicates that {*}{{put}} and {{range}} acquire the same locks but in > different order{*}, which leads to deadlock under concurrent load. > *Expected Behavior:* > Kafka Streams API should guarantee deadlock-free operation. Store writes > ({{{}put{}}}) and IQ reads ({{{}range{}}}) should not block each other in a > way that leads to lock inversion. > *Steps to Reproduce:* > # Create a Kafka Streams topology with a RocksDB state store receiving > continuous writes. > # In a parallel thread, issue a high number of Interactive Query {{range}} > requests (≈50+ queued). > # Observe that the system eventually enters deadlock. > * > *Impact:* > * Application stops processing data. > * Interactive Queries fail indefinitely. > * Requires manual restart to recover. > *Notes:* > * Appears to be a lock ordering bug in {{{}RocksDBStore{}}}. > * Expected the Streams API to coordinate thread-safety and prevent such > deadlocks. -- This message was sent by Atlassian Jira (v8.20.10#820010)