I think SHS only uses LevelDB/RocksDB to store intermediate data, supporting re-parsing to rebuild the cache should be fine enough.
Also share my experience about using LevelDB/RocksDB for SHS, it seems LevelDB has native memory leak issues, at least for the SHS use case, I need to reboot the SHS for every two months to recover it, issue gone after upgrading to Spark 3.3 and switching to RocksDB. Scale and Performance: we keep ~800k applications event logs for the event log HDFS directory, multiple threads re-parsing to rebuild listing.rdb takes ~15mins. Thanks, Cheng Pan > On Jun 6, 2025, at 15:36, Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote: > > IMHO, it's probably dependent on how long the rewrite will take, from reading > the event log. If loading the state from LevelDB and rewriting to RocksDB is > quite much faster, then we may want to support this for a couple minor > releases to not force users to lose their cache. If there is no such > difference, it is probably good to gradually migrate them automatically via > opt-in for a couple minor releases. In both cases, we can enforce migration > (neither opt-in nor opt-out) after that period. > > On Fri, Jun 6, 2025 at 10:51 AM Jia Fan <fanjia1...@gmail.com > <mailto:fanjia1...@gmail.com>> wrote: >> This is indeed an issue at the moment. Personally, I haven't found a >> proper way to migrate data from LevelDB to RocksDB, as their storage >> structures are different. Should we wait until a reasonable migration >> solution becomes available before moving forward with this? >> >> Jungtaek Lim <kabhwan.opensou...@gmail.com >> <mailto:kabhwan.opensou...@gmail.com>> 于2025年5月28日周三 15:41写道: >> > >> > Thanks for initiating this. >> > >> > I wonder if we don't have any compatibility issue on every component - SS >> > area does not have an issue, but I don't quite remember if the history >> > server would be OK with this. What is the story of the migration if they >> > had been using leveldb? I guess it could be probably re-parsed, but do we >> > need to ask users to perform some manual work to do that? >> > >> > On Wed, May 28, 2025 at 2:27 PM Yang Jie <yangji...@apache.org >> > <mailto:yangji...@apache.org>> wrote: >> >> >> >> The project "org.fusesource.leveldbjni:leveldbjni" released its last >> >> version 12 years ago, and its code repository was last updated 8 years >> >> ago. Consequently, I believe it's challenging for us to receive ongoing >> >> maintenance and support from this project. >> >> >> >> On the flip side, when developers implement new features related to Spark >> >> code, they have become accustomed to using rocksdb instead of leveldb. >> >> >> >> Furthermore, in Spark 4.0, support for leveldb was deprecated, and the >> >> default implementation of the corresponding functionality was switched to >> >> rocksdb. >> >> >> >> Given these factors, I support discontinuing support for leveldb. >> >> >> >> >> >> Thanks >> >> JIe Yang >> >> >> >> On 2025/05/27 08:26:06 Jia Fan wrote: >> >> > Hi all, >> >> > >> >> > I'd like to start a discussion about removing LevelDB support from >> >> > Apache Spark. >> >> > >> >> > As noted in >> >> > SPARK-44223(https://issues.apache.org/jira/browse/SPARK-44223), >> >> > LevelDB support was deprecated in Spark 4.0. It’s no longer actively >> >> > maintained or widely used, and continuing to support it brings >> >> > unnecessary maintenance and dependency complexity. >> >> > >> >> > A PR has been opened here to remove it entirely: >> >> > https://github.com/apache/spark/pull/51027 >> >> > >> >> > WDYT? >> >> > >> >> > Best regards, >> >> > Jia Fan >> >> > >> >> > --------------------------------------------------------------------- >> >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> > <mailto:dev-unsubscr...@spark.apache.org> >> >> > >> >> > >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> <mailto:dev-unsubscr...@spark.apache.org> >> >>