[
https://issues.apache.org/jira/browse/KAFKA-12748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
A. Sophie Blee-Goldman updated KAFKA-12748:
-------------------------------------------
Description:
With the rocksdb version bump comes a lot of new options, some of which look
interesting enough to explore for usage in Streams. We should try setting these
as default options and run the benchmarks to look for any performance benefit
(or decrease). See javadocs for all Options
[here|https://javadoc.io/doc/org.rocksdb/rocksdbjni/latest/org/rocksdb/Options.html]
Options.setAvoidUnnecessaryBlockingIO:
- As the name suggest, avoids blocking/long-latency tasks by scheduling a
background job to do it
Options.setSkipCheckingSstFileSizesOnDbOpen:
- Speeds up startup time if there are many sst files, could mean less
overhead from things like rebalancing where tasks are migrated between clients
or threads. Not sure how many sst files counts as "many", may be less useful
now that we've disabled bulk loading
Options.setBestEffortsRecovery:
- Interesting feature to allow recovering missing files without the use of
the WAL. Could be useful if the on-disk state is corrupted (eg user deletes a
file) without needing to rebuild state from scratch. Though I'd want to dig in
further to understand what exactly it does and does not do. Not a performance
improvement but we should run the benchmarks to make sure it doesn't make the
performance worse.
Options.setWriteDbidToManifest:
- Should be set to true if/when we ever need to rely on the DB id eg for
backups. Also not a performance improvement but we should still benchmark this.
Options.optimizeForSmallDb:
- This one is definitely not something we should set by default, as "small"
here means under 1GB. But it's probably worth at least calling out in the docs
for those users who know their data set size (per store) is under a GB
was:
With the rocksdb version bump comes a lot of new options, some of which look
interesting enough to explore for usage in Streams. We should try setting these
as default options and run the benchmarks to look for any performance benefit
(or decrease). See javadocs for all Options
[here|https://javadoc.io/doc/org.rocksdb/rocksdbjni/latest/org/rocksdb/Options.html]
Options.setAvoidUnnecessaryBlockingIO:
- As the name suggest, avoids blocking/long-latency tasks by scheduling a
background job to do it
Options.setBestEffortsRecovery:
- Interesting feature to allow recovering missing files without the use of
the WAL. Could be useful if the on-disk state is corrupted (eg user deletes a
file) without needing to rebuild state from scratch. Though I'd want to dig in
further to understand what exactly it does and does not do. Not a performance
improvement but we should run the benchmarks to make sure it doesn't make the
performance worse.
Options.setWriteDbidToManifest:
- Should be set to true if/when we ever need to rely on the DB id eg for
backups. Also not a performance improvement but we should still benchmark this.
Options.optimizeForSmallDb:
- This one is definitely not something we should set by default, as "small"
here means under 1GB. But it's probably worth at least calling out in the docs
for those users who know their data set size (per store) is under a GB
> Explore new RocksDB options to consider enabling by default
> -----------------------------------------------------------
>
> Key: KAFKA-12748
> URL: https://issues.apache.org/jira/browse/KAFKA-12748
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Reporter: A. Sophie Blee-Goldman
> Priority: Major
>
> With the rocksdb version bump comes a lot of new options, some of which look
> interesting enough to explore for usage in Streams. We should try setting
> these as default options and run the benchmarks to look for any performance
> benefit (or decrease). See javadocs for all Options
> [here|https://javadoc.io/doc/org.rocksdb/rocksdbjni/latest/org/rocksdb/Options.html]
> Options.setAvoidUnnecessaryBlockingIO:
> - As the name suggest, avoids blocking/long-latency tasks by scheduling a
> background job to do it
> Options.setSkipCheckingSstFileSizesOnDbOpen:
> - Speeds up startup time if there are many sst files, could mean less
> overhead from things like rebalancing where tasks are migrated between
> clients or threads. Not sure how many sst files counts as "many", may be less
> useful now that we've disabled bulk loading
> Options.setBestEffortsRecovery:
> - Interesting feature to allow recovering missing files without the use
> of the WAL. Could be useful if the on-disk state is corrupted (eg user
> deletes a file) without needing to rebuild state from scratch. Though I'd
> want to dig in further to understand what exactly it does and does not do.
> Not a performance improvement but we should run the benchmarks to make sure
> it doesn't make the performance worse.
> Options.setWriteDbidToManifest:
> - Should be set to true if/when we ever need to rely on the DB id eg for
> backups. Also not a performance improvement but we should still benchmark
> this.
> Options.optimizeForSmallDb:
> - This one is definitely not something we should set by default, as
> "small" here means under 1GB. But it's probably worth at least calling out in
> the docs for those users who know their data set size (per store) is under a
> GB
--
This message was sent by Atlassian Jira
(v8.3.4#803005)