Yue Ma created FLINK-33946: ------------------------------ Summary: RocksDb sets setAvoidFlushDuringShutdown to true to speed up Task Cancel Key: FLINK-33946 URL: https://issues.apache.org/jira/browse/FLINK-33946 Project: Flink Issue Type: Improvement Components: Runtime / State Backends Affects Versions: 1.19.0 Reporter: Yue Ma Fix For: 1.19.0
When a Job fails, the task needs to be canceled and re-deployed. RocksDBStatebackend will call RocksDB.close when disposing. {code:java} if (!shutting_down_.load(std::memory_order_acquire) && has_unpersisted_data_.load(std::memory_order_relaxed) && !mutable_db_options_.avoid_flush_during_shutdown) { if (immutable_db_options_.atomic_flush) { autovector<ColumnFamilyData*> cfds; SelectColumnFamiliesForAtomicFlush(&cfds); mutex_.Unlock(); Status s = AtomicFlushMemTables(cfds, FlushOptions(), FlushReason::kShutDown); s.PermitUncheckedError(); //**TODO: What to do on error? mutex_.Lock(); } else { for (auto cfd : *versions_->GetColumnFamilySet()) { if (!cfd->IsDropped() && cfd->initialized() && !cfd->mem()->IsEmpty()) { cfd->Ref(); mutex_.Unlock(); Status s = FlushMemTable(cfd, FlushOptions(), FlushReason::kShutDown); s.PermitUncheckedError(); //**TODO: What to do on error? mutex_.Lock(); cfd->UnrefAndTryDelete(); } } } {code} By default (avoid_flush_during_shutdown=false) RocksDb requires FlushMemtable when Close. When the disk pressure is high or the Memtable is large, this process will be more time-consuming, which will cause the Task to get stuck in the Canceling stage and affect the speed of job Failover. In fact, it is completely unnecessary to Flush memtable when Flink Task is Close, because the data can be replayed from Checkpoint. So we can set avoid_flush_during_shutdown to true to speed up Task Failover -- This message was sent by Atlassian Jira (v8.20.10#820010)