[ https://issues.apache.org/jira/browse/FLINK-36429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892488#comment-17892488 ]
Eaugene Thomas commented on FLINK-36429: ---------------------------------------- Hi , If anyone is not working on , I am happy to take this up ? > Enhancing Flink History Server File Storage and Retrieval with RocksDB > ---------------------------------------------------------------------- > > Key: FLINK-36429 > URL: https://issues.apache.org/jira/browse/FLINK-36429 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends > Affects Versions: 1.20.0 > Reporter: Xiaowen Sun > Priority: Major > Labels: historyserver > Original Estimate: 2,016h > Remaining Estimate: 2,016h > > Currently, when a Flink job finishes, it writes an archive as a single file > that maps paths to JSON files. Flink History Server (FHS) job archives are > pulled locally where the FHS is running on, and this process creates a local > directory that expands based on the contents of the single archive file. > Because of how the FHS stores the files, there are a large number of > directories created in the local file system. This system can become > inefficient and slow as the volume of job archives increases, creating > bottlenecks in job data navigation and retrieval. > To illustrate the problem of inode usage, let’s consider a scenario where > there are 5000 subtasks. Each subtask creates its own directory, and within > each subtask directory, there are additional directories that might store > only a single file. This structure rapidly increases the number of inodes > consumed. > Integrating RocksDB, a high-performance embedded database for key-value data, > aims to resolve these issues by offering faster data access and better > scalability. This integration is expected to significantly enhance the > operational efficiency of FHS by allowing faster data retrieval and enabling > a larger cache on local Kubernetes deployments, thus overcoming inode > limitations -- This message was sent by Atlassian Jira (v8.20.10#820010)