[ https://issues.apache.org/jira/browse/FLINK-17288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gabor Somogyi resolved FLINK-17288. ----------------------------------- Resolution: Duplicate > Speedup loading from savepoints into RocksDB by bulk load > --------------------------------------------------------- > > Key: FLINK-17288 > URL: https://issues.apache.org/jira/browse/FLINK-17288 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends > Reporter: Jun Qin > Priority: Not a Priority > Labels: auto-deprioritized-major, auto-deprioritized-minor, > pull-request-available > > When resource is a constraint, loading a big savepoint into RocksDB may take > some time. This may also impact the job recovery time when the savepoint was > used for recovery. > Bulk load from savepoint should help in this regard. Here is an excerpt from > the RocksDB FAQ (https://github.com/facebook/rocksdb/wiki/RocksDB-FAQ): > {quote}*Q: What's the fastest way to load data into RocksDB?* > A: A fast way to direct insert data to the DB: > # using single writer thread and insert in sorted order > # batch hundreds of keys into one write batch > # use vector memtable > # make sure options.max_background_flushes is at least 4 > # before inserting the data, disable automatic compaction, set > options.level0_file_num_compaction_trigger, > options.level0_slowdown_writes_trigger and options.level0_stop_writes_trigger > to very large. After inserting all the data, issue a manual compaction. > 3-5 will be automatically done if you call Options::PrepareForBulkLoad() to > your option > If you can pre-process the data offline before inserting. There is a faster > way: you can sort the data, generate SST files with non-overlapping ranges in > parallel and bulkload the SST files. See > [https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files] > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010)