[ 
https://issues.apache.org/jira/browse/FLINK-28390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564959#comment-17564959
 ] 

ming li commented on FLINK-28390:
---------------------------------

[~martijnvisser] we generally set the TTL for the state, assuming that the 
MAX_SIZE of the FIFO is configured to be infinite, and the TTL of the file is 
set to the TTL of the state, which may not cause data loss. At the same time, 
each KV does not need to set the TTL field, thereby reducing the state size and 
serialization overhead.


I think FIFO may not be suitable for scenarios where TTL is not set for the 
state. If all states are configured with TTL, then FIFO looks like a good 
choice.

> Allows RocksDB to configure FIFO Compaction to reduce CPU overhead.
> -------------------------------------------------------------------
>
>                 Key: FLINK-28390
>                 URL: https://issues.apache.org/jira/browse/FLINK-28390
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>            Reporter: ming li
>            Priority: Major
>
> We know that the fifo compaction strategy may silently delete data and may 
> lose data for the business. But in some scenarios, FIFO compaction can be a 
> very effective way to reduce CPU usage.
>  
> Flink's Taskmanager is usually some small-scale processes, such as allocating 
> 4 CPUs and 16G memory. When the state size is small, the CPU overhead 
> occupied by RocksDB is not high, and as the state increases, RocksDB may 
> frequently be in the compaction operation, which will occupy a large amount 
> of CPU and affect the computing operation.
>  
> We usually configure a TTL for the state, so when using FIFO we can configure 
> it to be slightly longer than the TTL, so that the upper layer is the same as 
> before. 
>  
> Although the FIFO Compaction strategy may bring space amplification, the disk 
> is cheaper than the CPU after all, so the overall cost is reduced.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to