[ 
https://issues.apache.org/jira/browse/FLINK-17288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Somogyi resolved FLINK-17288.
-----------------------------------
    Resolution: Duplicate

> Speedup loading from savepoints into RocksDB by bulk load
> ---------------------------------------------------------
>
>                 Key: FLINK-17288
>                 URL: https://issues.apache.org/jira/browse/FLINK-17288
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>            Reporter: Jun Qin
>            Priority: Not a Priority
>              Labels: auto-deprioritized-major, auto-deprioritized-minor, 
> pull-request-available
>
> When resource is a constraint,  loading a big savepoint into RocksDB may take 
> some time. This may also impact the job recovery time when the savepoint was 
> used for recovery.
> Bulk load from savepoint should help in this regard. Here is an excerpt from 
> the RocksDB FAQ (https://github.com/facebook/rocksdb/wiki/RocksDB-FAQ):
> {quote}*Q: What's the fastest way to load data into RocksDB?*
> A: A fast way to direct insert data to the DB:
>  # using single writer thread and insert in sorted order
>  # batch hundreds of keys into one write batch
>  # use vector memtable
>  # make sure options.max_background_flushes is at least 4
>  # before inserting the data, disable automatic compaction, set 
> options.level0_file_num_compaction_trigger, 
> options.level0_slowdown_writes_trigger and options.level0_stop_writes_trigger 
> to very large. After inserting all the data, issue a manual compaction.
> 3-5 will be automatically done if you call Options::PrepareForBulkLoad() to 
> your option
> If you can pre-process the data offline before inserting. There is a faster 
> way: you can sort the data, generate SST files with non-overlapping ranges in 
> parallel and bulkload the SST files. See 
> [https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files]
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to