[jira] [Updated] (KAFKA-12748) Explore new RocksDB options to consider enabling by default

A. Sophie Blee-Goldman (Jira) Tue, 04 May 2021 16:18:06 -0700


     [ 
https://issues.apache.org/jira/browse/KAFKA-12748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


A. Sophie Blee-Goldman updated KAFKA-12748:
-------------------------------------------
    Description: 
With the rocksdb version bump comes a lot of new options, some of which look 
interesting enough to explore for usage in Streams. We should try setting these 
as default options and run the benchmarks to look for any performance benefit 
(or decrease). See javadocs for all Options 
[here|https://javadoc.io/doc/org.rocksdb/rocksdbjni/latest/org/rocksdb/Options.html]


Options.setAvoidUnnecessaryBlockingIO: 
    - As the name suggest, avoids blocking/long-latency tasks by scheduling a 
background job to do it

Options.setSkipCheckingSstFileSizesOnDbOpen:
    - Speeds up startup time if there are many sst files, could mean less 
overhead from things like rebalancing where tasks are migrated between clients 
or threads. Not sure how many sst files counts as "many", may be less useful 
now that we've disabled bulk loading 

 Options.setBestEffortsRecovery: 
    - Interesting feature to allow recovering missing files without the use of 
the WAL. Could be useful if the on-disk state is corrupted (eg user deletes a 
file) without needing to rebuild state from scratch. Though I'd want to dig in 
further to understand what exactly it does and does not do. Not a performance 
improvement but we should run the benchmarks to make sure it doesn't make the 
performance worse.

Options.setWriteDbidToManifest:
    - Should be set to true if/when we ever need to rely on the DB id eg for 
backups. Also not a performance improvement but we should still benchmark this.



Options.optimizeForSmallDb:
    - This one is definitely not something we should set by default, as "small" 
here means under 1GB. But it's probably worth at least calling out in the docs 
for those users who know their data set size (per store) is under a GB

  was:
With the rocksdb version bump comes a lot of new options, some of which look 
interesting enough to explore for usage in Streams. We should try setting these 
as default options and run the benchmarks to look for any performance benefit 
(or decrease). See javadocs for all Options 
[here|https://javadoc.io/doc/org.rocksdb/rocksdbjni/latest/org/rocksdb/Options.html]


Options.setAvoidUnnecessaryBlockingIO: 
    - As the name suggest, avoids blocking/long-latency tasks by scheduling a 
background job to do it

 Options.setBestEffortsRecovery: 
    - Interesting feature to allow recovering missing files without the use of 
the WAL. Could be useful if the on-disk state is corrupted (eg user deletes a 
file) without needing to rebuild state from scratch. Though I'd want to dig in 
further to understand what exactly it does and does not do. Not a performance 
improvement but we should run the benchmarks to make sure it doesn't make the 
performance worse.

Options.setWriteDbidToManifest:
    - Should be set to true if/when we ever need to rely on the DB id eg for 
backups. Also not a performance improvement but we should still benchmark this.


Options.optimizeForSmallDb:
    - This one is definitely not something we should set by default, as "small" 
here means under 1GB. But it's probably worth at least calling out in the docs 
for those users who know their data set size (per store) is under a GB


> Explore new RocksDB options to consider enabling by default
> -----------------------------------------------------------
>
>                 Key: KAFKA-12748
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12748
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: A. Sophie Blee-Goldman
>            Priority: Major
>
> With the rocksdb version bump comes a lot of new options, some of which look 
> interesting enough to explore for usage in Streams. We should try setting 
> these as default options and run the benchmarks to look for any performance 
> benefit (or decrease). See javadocs for all Options 
> [here|https://javadoc.io/doc/org.rocksdb/rocksdbjni/latest/org/rocksdb/Options.html]
> Options.setAvoidUnnecessaryBlockingIO: 
>     - As the name suggest, avoids blocking/long-latency tasks by scheduling a 
> background job to do it
> Options.setSkipCheckingSstFileSizesOnDbOpen:
>     - Speeds up startup time if there are many sst files, could mean less 
> overhead from things like rebalancing where tasks are migrated between 
> clients or threads. Not sure how many sst files counts as "many", may be less 
> useful now that we've disabled bulk loading 
>  Options.setBestEffortsRecovery: 
>     - Interesting feature to allow recovering missing files without the use 
> of the WAL. Could be useful if the on-disk state is corrupted (eg user 
> deletes a file) without needing to rebuild state from scratch. Though I'd 
> want to dig in further to understand what exactly it does and does not do. 
> Not a performance improvement but we should run the benchmarks to make sure 
> it doesn't make the performance worse.
> Options.setWriteDbidToManifest:
>     - Should be set to true if/when we ever need to rely on the DB id eg for 
> backups. Also not a performance improvement but we should still benchmark 
> this.
> Options.optimizeForSmallDb:
>     - This one is definitely not something we should set by default, as 
> "small" here means under 1GB. But it's probably worth at least calling out in 
> the docs for those users who know their data set size (per store) is under a 
> GB



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (KAFKA-12748) Explore new RocksDB options to consider enabling by default

Reply via email to