On Jun 29, 2016, at 22:44, Guozhang Wang <wangg...@gmail.com> wrote:
> 
> One way to mentally quantify your state store usage is to consider the
> total key space in your reduceByKey() operator, and multiply by the average
> key-value pair size. Then you need to consider the RocksDB write / space
> amplification factor as well.

That makes sense, thank you!

> Currently Kafka Streams hard-write some RocksDB config values such as block
> size to achieve good write performance with the cost of write
> amplification, but we are now working on exposing those configs to the
> users so that they can override themselves:
> 
> https://issues.apache.org/jira/browse/KAFKA-3740

That looks excellent for the next release ;)

In the meantime, do you know anything specific about the RocksDB behavior with 
the LOG and LOG.old.{timestamp} files? (They don’t seem to me to be directly 
related to the storage space required by the actual state itself, unless I’m 
misunderstanding the word “log” — it is a bit overloaded in this community.) Is 
there something I can do in code to affect this? Or some way to 
understand/predict the growth patterns of these files, whether or not RocksDB 
has some kind of built-in cleanup feature or whether I need to set up a cron 
job on my own?

Thanks!
Avi

Reply via email to