On Jun 29, 2016, at 22:44, Guozhang Wang <wangg...@gmail.com> wrote: > > One way to mentally quantify your state store usage is to consider the > total key space in your reduceByKey() operator, and multiply by the average > key-value pair size. Then you need to consider the RocksDB write / space > amplification factor as well.
That makes sense, thank you! > Currently Kafka Streams hard-write some RocksDB config values such as block > size to achieve good write performance with the cost of write > amplification, but we are now working on exposing those configs to the > users so that they can override themselves: > > https://issues.apache.org/jira/browse/KAFKA-3740 That looks excellent for the next release ;) In the meantime, do you know anything specific about the RocksDB behavior with the LOG and LOG.old.{timestamp} files? (They don’t seem to me to be directly related to the storage space required by the actual state itself, unless I’m misunderstanding the word “log” — it is a bit overloaded in this community.) Is there something I can do in code to affect this? Or some way to understand/predict the growth patterns of these files, whether or not RocksDB has some kind of built-in cleanup feature or whether I need to set up a cron job on my own? Thanks! Avi