[ https://issues.apache.org/jira/browse/KAFKA-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841452#comment-16841452 ]
Pavel Savov commented on KAFKA-8367: ------------------------------------ Hi [~vvcephei] and [~ableegoldman], Thank you for your suggestions. Yes, we are using a RocksDB Config Setter (and had used it before upgrading to 2.2.0 too). The only object we are creating in that setter is an org.rocksdb.BlockBasedTableConfig instance: {code:java} val tableConfig = new org.rocksdb.BlockBasedTableConfig() tableConfig.setBlockCacheSize(blockCacheSize) // block_cache_size (fetch cache) tableConfig.setBlockSize(DefaultBlockSize) tableConfig.setCacheIndexAndFilterBlocks(DefaultCacheIndexAndFilterBlocks) options.setTableFormatConfig(tableConfig) {code} I tried building from the latest trunk but I'm afraid it didn't fix the leak. Please let me know if there is any info I could provide you with that could help narrow down the issue. Thanks! > Non-heap memory leak in Kafka Streams > ------------------------------------- > > Key: KAFKA-8367 > URL: https://issues.apache.org/jira/browse/KAFKA-8367 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 2.2.0 > Reporter: Pavel Savov > Priority: Major > Attachments: memory-prod.png, memory-test.png > > > We have been observing a non-heap memory leak after upgrading to Kafka > Streams 2.2.0 from 2.0.1. We suspect the source to be around RocksDB as the > leak only happens when we enable stateful stream operations (utilizing > stores). We are aware of *KAFKA-8323* and have created our own fork of 2.2.0 > and ported the fix scheduled for release in 2.2.1 to our fork. It did not > stop the leak, however. > We are having this memory leak in our production environment where the > consumer group is auto-scaled in and out in response to changes in traffic > volume, and in our test environment where we have two consumers, no > autoscaling and relatively constant traffic. > Below is some information I'm hoping will be of help: > * RocksDB Config: > Block cache size: 4 MiB > Write buffer size: 2 MiB > Block size: 16 KiB > Cache index and filter blocks: true > Manifest preallocation size: 64 KiB > Max write buffer number: 3 > Max open files: 6144 > > * Memory usage in production > The attached graph (memory-prod.png) shows memory consumption for each > instance as a separate line. The horizontal red line at 6 GiB is the memory > limit. > As illustrated on the attached graph from production, memory consumption in > running instances goes up around autoscaling events (scaling the consumer > group either in or out) and associated rebalancing. It stabilizes until the > next autoscaling event but it never goes back down. > An example of scaling out can be seen from around 21:00 hrs where three new > instances are started in response to a traffic spike. > Just after midnight traffic drops and some instances are shut down. Memory > consumption in the remaining running instances goes up. > Memory consumption climbs again from around 6:00AM due to increased traffic > and new instances are being started until around 10:30AM. Memory consumption > never drops until the cluster is restarted around 12:30. > > * Memory usage in test > As illustrated by the attached graph (memory-test.png) we have a fixed number > of two instances in our test environment and no autoscaling. Memory > consumption rises linearly until it reaches the limit (around 2:00 AM on > 5/13) and Mesos restarts the offending instances, or we restart the cluster > manually. > > * No heap leaks observed > * Window retention: 2 or 11 minutes (depending on operation type) > * Issue not present in Kafka Streams 2.0.1 > * No memory leak for stateless stream operations (when no RocksDB stores are > used) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)