KStreams in-memory state-store

Igor Danis Mon, 27 Jan 2020 06:07:20 -0800

Hi all,

I have question about kafka-streams, particularly in-memory state-store
(/org.apache.kafka.streams.state.internals.InMemoryKeyValueStore/).

I believe that topology is irrelevant here, but let's say I have onesource topic with single partitionfeeding data into one statefull processor associated to single in-memorystate store.

This results in topology with single task.

This topology is run in 2 application instances:
- First instance (A) runs the task in active mode
- Second instance (B) runs the task as standby

Our use-case is low-latency processing, hence we need to keep rebalancedowntime as low as possible (ideally few hundreds milliseconds).

Let's say that we kill instance A, which triggers rebalance and B takesover the processing.

We found that, when task on B transitions from STANDBY into ACTIVE mode,it closes in-memory state-store and effectively throwsaway any state read from changelog while it was in STANDBY. Nocheckpoints nor state is preserved.

Subsequently in ACTIVE mode, it reads again changelog withrestore-consumer. Depending on the size of the changelog this operationcan take few minutes during which no processing is done. This happensdespite of B having up-to-date standby-replica,

which is really counterintuitive. What is the reason for this behavior ?

Note that we initially used persistent RocksDB state-store, but we hadsimillar issues with latency (only this time it was due to RocksDBcompaction I believe), so we prefer in-memory solution.

If this question is more appropriate for developer mailing list, pleaselet me know.


Thanks and Regards,
Igor

KStreams in-memory state-store

Reply via email to