[ 
https://issues.apache.org/jira/browse/FLINK-36429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892488#comment-17892488
 ] 

Eaugene Thomas commented on FLINK-36429:
----------------------------------------

Hi , If anyone is not working on , I am happy to take this up ? 

> Enhancing Flink History Server File Storage and Retrieval with RocksDB
> ----------------------------------------------------------------------
>
>                 Key: FLINK-36429
>                 URL: https://issues.apache.org/jira/browse/FLINK-36429
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>    Affects Versions: 1.20.0
>            Reporter: Xiaowen Sun
>            Priority: Major
>              Labels: historyserver
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> Currently, when a Flink job finishes, it writes an archive as a single file 
> that maps paths to JSON files. Flink History Server (FHS) job archives are 
> pulled locally where the FHS is running on, and this process creates a local 
> directory that expands based on the contents of the single archive file.
> Because of how the FHS stores the files, there are a large number of 
> directories created in the local file system. This system can become 
> inefficient and slow as the volume of job archives increases, creating 
> bottlenecks in job data navigation and retrieval.
> To illustrate the problem of inode usage, let’s consider a scenario where 
> there are 5000 subtasks. Each subtask creates its own directory, and within 
> each subtask directory, there are additional directories that might store 
> only a single file. This structure rapidly increases the number of inodes 
> consumed.
> Integrating RocksDB, a high-performance embedded database for key-value data, 
> aims to resolve these issues by offering faster data access and better 
> scalability. This integration is expected to significantly enhance the 
> operational efficiency of FHS by allowing faster data retrieval and enabling 
> a larger cache on local Kubernetes deployments, thus overcoming inode 
> limitations



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to