Sophie Blee-Goldman created KAFKA-9062:
------------------------------------------

             Summary: RocksDB writes may stall after bulk loading lots of state
                 Key: KAFKA-9062
                 URL: https://issues.apache.org/jira/browse/KAFKA-9062
             Project: Kafka
          Issue Type: Bug
          Components: streams
            Reporter: Sophie Blee-Goldman


RocksDB may stall writes at times when background compactions or flushes are 
having trouble keeping up. This means we can effectively end up blocking 
indefinitely during a StateStore#put call within Streams, and may get kicked 
from the group if the throttling does not ease up within the max poll interval.

Example: when restoring large amounts of state from scratch, we use the 
strategy recommended by RocksDB of turning off automatic compactions and 
dumping everything into L0. We do batch somewhat, but do not sort these small 
batches before loading into the db, so we end up with a large number of 
unsorted L0 files.

When restoration is complete and we toggle the db back to normal (not bulk 
loading) settings, a background compaction is triggered to merge all these into 
the next level. This background compaction can take a long time to merge 
unsorted keys, especially when the amount of data is quite large.

Any new writes while the number of L0 files exceeds the max will be stalled 
until the compaction can finish, and processing after restoring from scratch 
can block beyond the polling interval



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to