I don’t know much about the performance improvements that may come from using bloom filters, but I believe you can also improve RocksDB performance by increasing managed memory https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#taskmanager-memory-managed-fraction which RocksDB uses.
From: Kenan Kılıçtepe <kkilict...@gmail.com> Date: Friday, 20 October 2023 at 14:51 To: user <user@flink.apache.org> Subject: [EXTERNAL] Bloom Filter for Rocksdb CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Can someone tell the exact performance effect of enabling bloom filter? May enabling it cause some unpredictable performance problems? I read what it is and how it works and it makes sense but I also asked myself why the default value of state.backend.rocksdb.use-bloom-filter is false. We have a 5 servers flink cluster, processing real time IoT data coming from 5 million devices and for a lot of jobs, we keep different states for each device. Sometimes we have performance issues and when I check the flamegraph on the test server I always see rocksdb.get() is the blocker. I just want to increase rocksdb performance. Thanks