I believe bloom filters are off by default because they add overhead and
aren't always helpful. I.e., in workloads that are write heavy and have few
reads, bloom filters aren't worth the overhead.

David

On Fri, Oct 20, 2023 at 11:31 AM Mate Czagany <czmat...@gmail.com> wrote:

> Hi,
>
> There have been no reports about setting this configuration causing any
> issues. I would guess it's off by default because it can increase the
> memory usage by an unpredictable amount.
>
> I would say feel free to enable it, from what you've said I also think
> that this would improve the performance of your jobs. But make sure to
> configure your jobs so that they will be able to accommodate the potential
> memory footprint growth. Also please read the following resources to know
> more about RocksDBs bloom filter:
> https://github.com/facebook/rocksdb/wiki/RocksDB-Bloom-Filter
> https://rocksdb.org/blog/2014/09/12/new-bloom-filter-format.html
>
> Regards,
> Mate
>
>
> Kenan Kılıçtepe <kkilict...@gmail.com> ezt írta (időpont: 2023. okt. 20.,
> P, 15:50):
>
>> Can someone tell the exact performance effect of enabling bloom filter?
>> May enabling it cause some unpredictable performance problems?
>>
>> I read what it is and how it works and it makes sense but  I also asked
>> myself why the default value of state.backend.rocksdb.use-bloom-filter is
>> false.
>>
>> We have a 5 servers flink cluster, processing real time IoT data coming
>> from 5 million devices and for a lot of jobs, we keep different states for
>> each device.
>>
>> Sometimes we have performance issues and when I check the flamegraph on
>> the test server I always see rocksdb.get() is the blocker. I just want to
>> increase rocksdb performance.
>>
>> Thanks
>>
>>

Reply via email to