[ 
https://issues.apache.org/jira/browse/KAFKA-19629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-19629:
------------------------------------
    Affects Version/s: 3.8.0
                           (was: 3.8.1)
                           (was: 3.9.1)

> Deadlock in Kafka Streams when processing Interactive Queries and state store 
> updates concurrently
> --------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-19629
>                 URL: https://issues.apache.org/jira/browse/KAFKA-19629
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 3.8.0
>         Environment: Kafka Streams, kotlin, linux, docker. JDK 21
>            Reporter: Evgheni Popusoi
>            Priority: Major
>         Attachments: thread-dump-1.txt, thread-dump-2.txt
>
>
> We are using a Kafka Streams topology that continuously writes large volumes 
> of data into a RocksDB state store with stable throughput. In parallel, 
> another thread executes Interactive Query (IQ) requests against the same 
> local state store.
> When the number of IQ requests in the queue grows (≈50+), the application 
> enters a {*}deadlock state{*}.
> *Investigation:*
> Using a thread dump, we discovered a lock inversion between RocksDB 
> operations:
>  * {{RocksDBStore.put}}
>  ** blocked on {{org.apache.kafka.streams.query.Position@4ba00b6c}}
>  ** holding {{org.apache.kafka.streams.state.internals.RocksDBStore@414cff0e}}
>  * {{RocksDBStore.range}}
>  ** blocked on 
> {{org.apache.kafka.streams.state.internals.RocksDBStore@414cff0e}}
>  ** holding {{org.apache.kafka.streams.query.Position@4ba00b6c}}
> This indicates that {*}{{put}} and {{range}} acquire the same locks but in 
> different order{*}, which leads to deadlock under concurrent load.
> *Expected Behavior:*
> Kafka Streams API should guarantee deadlock-free operation. Store writes 
> ({{{}put{}}}) and IQ reads ({{{}range{}}}) should not block each other in a 
> way that leads to lock inversion.
> *Steps to Reproduce:*
>  # Create a Kafka Streams topology with a RocksDB state store receiving 
> continuous writes.
>  # In a parallel thread, issue a high number of Interactive Query {{range}} 
> requests (≈50+ queued).
>  # Observe that the system eventually enters deadlock.
>  *  
> *Impact:*
>  * Application stops processing data.
>  * Interactive Queries fail indefinitely.
>  * Requires manual restart to recover.
> *Notes:*
>  * Appears to be a lock ordering bug in {{{}RocksDBStore{}}}.
>  * Expected the Streams API to coordinate thread-safety and prevent such 
> deadlocks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to