[ https://issues.apache.org/jira/browse/FLINK-28390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564890#comment-17564890 ]
ming li commented on FLINK-28390: --------------------------------- Hi, [~masteryhx], [~Zhanghao Chen] Yes, although we currently have the compaction configuration of FIFO, it is actually unusable (the TTL and MAX_SIZE of FIFO cannot be configured). In addition, we do not recommend users to use it, and there is potential data loss. So I think we have the following work to do: 1. Add FIFO related JNI, we can refer to https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style; 2. Add the documentation and precautions for using FIFO. In addition, when we used the FIFO of RocksDB internally, we also found a potential bug, which also needs to be fixed on the RocksDB branch of Flink. We can refer to https://github.com/facebook/rocksdb/issues/10133 > Allows RocksDB to configure FIFO Compaction to reduce CPU overhead. > ------------------------------------------------------------------- > > Key: FLINK-28390 > URL: https://issues.apache.org/jira/browse/FLINK-28390 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends > Reporter: ming li > Priority: Major > > We know that the fifo compaction strategy may silently delete data and may > lose data for the business. But in some scenarios, FIFO compaction can be a > very effective way to reduce CPU usage. > > Flink's Taskmanager is usually some small-scale processes, such as allocating > 4 CPUs and 16G memory. When the state size is small, the CPU overhead > occupied by RocksDB is not high, and as the state increases, RocksDB may > frequently be in the compaction operation, which will occupy a large amount > of CPU and affect the computing operation. > > We usually configure a TTL for the state, so when using FIFO we can configure > it to be slightly longer than the TTL, so that the upper layer is the same as > before. > > Although the FIFO Compaction strategy may bring space amplification, the disk > is cheaper than the CPU after all, so the overall cost is reduced. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)