Hi there,

I have a doubt regarding how to realize 'back-pressure' for windowed
streams.

Say I have a pipeline that consumes from a topic on a windowed basis, then
do some processing (whenever punctuate is called), and produces into
another topic.

If the incoming rates from all consumers is 10M/second, and the processing
rate of all the punctuates is something like 5M/second, then two things can
happen:

- If a in-memory store is used, on-heap memory will be drained gradually
and finally GC kicks in which leads to unnecessary rebalancing + other
things.

- If off-heap (RocksDB) is used, then over time, punctuate() will take
longer and longer time, and finally performance will be terrible +
something else that I do not know yet.

I understand the reason of these behaviors is that kafka Streams does
back-pressure by checking consumers buffer sizes, and StreamThread's buffer
size, but does NOT check state stores.

I think a solution for this is to 'add more Kafka Streams instance'. By
this I mean maybe today I need a processing rate of 1M/second, and tomorrow
I need 5M/second. Then a mechanism is needed for Kafka Streams to detect
this, and inform people who can add new instances either manually or better
automatically. And while waiting for people to react, the current running
Kafka Streams applications should not crash but can slow down a little bit
(by checking the state stores conditions, say number of records cached, or
total time taken for previous punctuates??).

Am I understanding correctly?

Thanks
Tianji

Reply via email to