Bumping this thread. Thanks! Best regards, Zihao
zihao chen <[email protected]> 于2026年5月23日周六 18:39写道: > Hi all, > > Thanks everyone for the valuable feedback and discussions on this FLIP. > > Based on the discussion so far, the proposal has received generally > positive > feedback, and several important points have been clarified, including: > > - ArchiveStorage API design considerations > - RocksDB deployment model and isolation between HistoryServer > instances > - Cleanup and retention strategy compatibility with existing mechanisms > > > Besides, the earlier related discussion can be found here: > https://lists.apache.org/thread/6thlq9c5twyvzmcw7q24nm4q0rcbz5qp > > If there are no further major concerns, I’m planning to start the VOTE > thread > next Tuesday. > > Please feel free to share any additional feedback before then. > > Best regards, > Zihao > > zihao chen <[email protected]> 于2026年5月19日周二 21:05写道: > >> Hi Zuo, >> >> Thanks for your feedback and for aligning in this direction. >> >> Here are the clarifications regarding your questions: >> >> - *RocksDB Deployment*: >> >> RocksDB instance is coupled with the HistoryServer instance (each >> instance has its own independent local RocksDB). There is no shared >> access between multiple HistoryServer instances. >> >> >> - *Cleanup Strategy*: >> >> The core cleanup still relies on the original ArchiveRetainedStrategy >> (max >> job counts, TTL, etc.). While we've also implemented a >> disk-capacity-based >> cleanup strategy in our internal practice to prevent disk exhaustion, >> this feature is relatively independent. I decouple it for now and >> discuss it >> further in a follow-up FLIP. >> >> >> Let me know if this looks good to you! >> >> >> Best regards, >> >> Zihao >> >> >> 魏祚 <[email protected]> 于2026年5月19日周二 17:33写道: >> >>> >>> >>> Hi Zihao, >>> >>> >>> Thanks for your proposal. The excessive small files problem of >>> HistoryServer is indeed a real pain point in large-scale production >>> environments, and introducing RocksDB is a great idea. >>> There's a few details I'd like to clarify: >>> What is the deployment strategy for RocksDB? Is there a scenario where >>> multiple HistoryServer instances share and access the same RocksDB >>> instance? If so, are there any potential compatibility or concurrency risks? >>> After introducing RocksDB, what is the strategy for cleaning up >>> historical garbage files and expired job archives? >>> >>> >>> Best regards, >>> Zuo Wei >>> >>> >>> ----- Original Message ----- >>> From: "zihao chen" <[email protected]> >>> To: [email protected] >>> Sent: Sat, 9 May 2026 11:37:08 +0800 >>> Subject: [DISCUSS] FLIP-XXX: Support Pluggable Storage Backend for >>> HistoryServer >>> >>> Hi all, >>> >>> I’d like to start a discussion on FLIP-XXX: >>> >>> *Support Pluggable Storage Backend forHistoryServer*. >>> >>> This FLIP proposes improving the HistoryServer >>> to address excessive *small files* when handling >>> large numbers of archived jobs. >>> >>> [Proposal] >>> Optional *RocksDB-based storage* to reduce >>> small files >>> >>> [Compatibility] >>> Full backward compatibility (FILE as default) >>> >>> The detailed design is described in the >>> FLIP document: >>> >>> >>> https://docs.google.com/document/d/1idHu5bq0GOsUuUAEIJSJ2UuekcDjbW0tHLNbsQfugDg/edit?usp=sharing >>> >>> This FLIP is split from the earlier discussion [1]. >>> >>> Looking forward to your feedback. >>> >>> [1] https://lists.apache.org/thread/6thlq9c5twyvzmcw7q24nm4q0rcbz5qp >>> >>> >>> Best regards, >>> >>> Zihao Chen >>> >>
