[ https://issues.apache.org/jira/browse/KAFKA-19629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias J. Sax updated KAFKA-19629: ------------------------------------ Affects Version/s: 3.8.0 (was: 3.8.1) (was: 3.9.1) > Deadlock in Kafka Streams when processing Interactive Queries and state store > updates concurrently > -------------------------------------------------------------------------------------------------- > > Key: KAFKA-19629 > URL: https://issues.apache.org/jira/browse/KAFKA-19629 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 3.8.0 > Environment: Kafka Streams, kotlin, linux, docker. JDK 21 > Reporter: Evgheni Popusoi > Priority: Major > Attachments: thread-dump-1.txt, thread-dump-2.txt > > > We are using a Kafka Streams topology that continuously writes large volumes > of data into a RocksDB state store with stable throughput. In parallel, > another thread executes Interactive Query (IQ) requests against the same > local state store. > When the number of IQ requests in the queue grows (≈50+), the application > enters a {*}deadlock state{*}. > *Investigation:* > Using a thread dump, we discovered a lock inversion between RocksDB > operations: > * {{RocksDBStore.put}} > ** blocked on {{org.apache.kafka.streams.query.Position@4ba00b6c}} > ** holding {{org.apache.kafka.streams.state.internals.RocksDBStore@414cff0e}} > * {{RocksDBStore.range}} > ** blocked on > {{org.apache.kafka.streams.state.internals.RocksDBStore@414cff0e}} > ** holding {{org.apache.kafka.streams.query.Position@4ba00b6c}} > This indicates that {*}{{put}} and {{range}} acquire the same locks but in > different order{*}, which leads to deadlock under concurrent load. > *Expected Behavior:* > Kafka Streams API should guarantee deadlock-free operation. Store writes > ({{{}put{}}}) and IQ reads ({{{}range{}}}) should not block each other in a > way that leads to lock inversion. > *Steps to Reproduce:* > # Create a Kafka Streams topology with a RocksDB state store receiving > continuous writes. > # In a parallel thread, issue a high number of Interactive Query {{range}} > requests (≈50+ queued). > # Observe that the system eventually enters deadlock. > * > *Impact:* > * Application stops processing data. > * Interactive Queries fail indefinitely. > * Requires manual restart to recover. > *Notes:* > * Appears to be a lock ordering bug in {{{}RocksDBStore{}}}. > * Expected the Streams API to coordinate thread-safety and prevent such > deadlocks. -- This message was sent by Atlassian Jira (v8.20.10#820010)