I find this tuning guide in RocksDB quite useful, regarding your write / space amplifications.
https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide Guozhang On Thu, Jun 30, 2016 at 8:36 AM, Avi Flax <avi.f...@parkassist.com> wrote: > On Jun 29, 2016, at 22:44, Guozhang Wang <wangg...@gmail.com> wrote: > > > > One way to mentally quantify your state store usage is to consider the > > total key space in your reduceByKey() operator, and multiply by the > average > > key-value pair size. Then you need to consider the RocksDB write / space > > amplification factor as well. > > That makes sense, thank you! > > > Currently Kafka Streams hard-write some RocksDB config values such as > block > > size to achieve good write performance with the cost of write > > amplification, but we are now working on exposing those configs to the > > users so that they can override themselves: > > > > https://issues.apache.org/jira/browse/KAFKA-3740 > > That looks excellent for the next release ;) > > In the meantime, do you know anything specific about the RocksDB behavior > with the LOG and LOG.old.{timestamp} files? (They don’t seem to me to be > directly related to the storage space required by the actual state itself, > unless I’m misunderstanding the word “log” — it is a bit overloaded in this > community.) Is there something I can do in code to affect this? Or some way > to understand/predict the growth patterns of these files, whether or not > RocksDB has some kind of built-in cleanup feature or whether I need to set > up a cron job on my own? > > Thanks! > Avi -- -- Guozhang