You might want to start with a lower commit interval, if you can handle some additional latency. I would bet that the frequent flushing is a major part of your problem: not just the act of flushing itself, but the consequences for the structure of the data in each rocksdb. If you end up flushing unfilled memtables then you'll end up with a large number of small L0 files that then have to be compacted, and until they are this can make the iterators/seeks less effective. Also it means the memtable is less effective as a write cache so you miss out on some immediate deduplication of updates to the same key.
There's been some recent work to decouple flushing from committing, so starting in 2.7 you shouldn't have to choose between low latency and cache/rocksdb performance. This release is currently in progress but I'd recommend checking it out when you can. I'm not sure what version you're using but in 2.5 we added some RocksDB metrics that could be useful for further insight. I think they're all recorded at the DEBUG level. Might be worth investigating. We also recently added some additional metrics to expose properties of RocksDB, which will also be available in the upcoming 2.7 release. Cheers, Sophie On Tue, Oct 27, 2020 at 1:49 PM Giselle van Dongen < giselle.vandon...@ugent.be> wrote: > Hi all, > > > We have a Kafka Streams job which has high CPU utilization. When profiling > the job, we saw that this was for a large part due to RocksDB methods: > flush, seek, put, get, iteratorCF. We use the default settings for our > RocksDB state store. Which configuration parameters are most important to > tune to lower CPU usage? Most documentation focuses on memory as the > bottleneck. > > > Our job does a join and window step. The commit interval is 1 second. We > enabled caching and the cache is 512MB large. We have 6 instances of 6 CPU > and 30 GB RAM. > > > > Thank you for any help! > >