It's a bit different for users leveraging LevelDB - since it requires opt-in, they are willing to use it if they still use it, hence they are likely to retain the config during the upgrade.
>From the initial post, there is a claim that we deprecated LevelDB in Apache Spark 4.0.0. Shall I ask what we did for deprecation, and would you mind giving me a discussion about it? I just wanted to make sure we are not making an outstanding change at one minor version upgrade. On Mon, Jun 9, 2025 at 3:01 PM Yang Jie <yangji...@apache.org> wrote: > I would like to provide some new information: > > 1. Spark 3.4.0 [SPARK-42277] has started using RocksDB as the default > option for `spark.history.store.hybridStore.diskBackend`. > > - Since Spark 3.4, Spark will use RocksDB store if > `spark.history.store.hybridStore.enabled` is true. To restore the behavior > before Spark 3.4, you can set `spark.history.store.hybridStore.diskBackend` > to `LEVELDB`. > > 2. Spark 4.0.0 [SPARK-45351] has begun using RocksDB as the default option > for `spark.shuffle.service.db.backend`. > > - Since Spark 4.0, `spark.shuffle.service.db.backend` is set to `ROCKSDB` > by default which means Spark will use RocksDB store for shuffle service. To > restore the behavior before Spark 4.0, you can set > `spark.shuffle.service.db.backend` to `LEVELDB`. > > So for users who hadn't explicitly configured the aforementioned options > to be `LEVELDB` before, the situations of data reconstruction or re-parsing > have already existed. > > On 2025/06/09 01:08:05 Jungtaek Lim wrote: > > Thanks for the valuable input. > > > > I think it's more about the case where upgrading would surprise the end > > users. If we simply remove LevelDB from the next release, we will be > > removing these intermediate data as well and enforcing them to rebuild > > everything. 15 mins is probably not super long from the given volume, but > > even a couple additional minutes could bring a negative sentiment if they > > ever opened this before. > > > > Would enabling the hybrid store reduce the surprise? If then maybe we > could > > ask users to enable it, with assigning a bit more memory (+ 2g on SHS > > process) if they didn't use the hybrid store. > > > > 2025년 6월 6일 (금) 오후 5:08, Cheng Pan <pan3...@gmail.com>님이 작성: > > > > > I think SHS only uses LevelDB/RocksDB to store intermediate data, > > > supporting re-parsing to rebuild the cache should be fine enough. > > > > > > Also share my experience about using LevelDB/RocksDB for SHS, it seems > > > LevelDB has native memory leak issues, at least for the SHS use case, I > > > need to reboot the SHS for every two months to recover it, issue gone > after > > > upgrading to Spark 3.3 and switching to RocksDB. > > > > > > Scale and Performance: we keep ~800k applications event logs for the > event > > > log HDFS directory, multiple threads re-parsing to rebuild listing.rdb > > > takes ~15mins. > > > > > > Thanks, > > > Cheng Pan > > > > > > > > > > > > On Jun 6, 2025, at 15:36, Jungtaek Lim <kabhwan.opensou...@gmail.com> > > > wrote: > > > > > > IMHO, it's probably dependent on how long the rewrite will take, from > > > reading the event log. If loading the state from LevelDB and rewriting > to > > > RocksDB is quite much faster, then we may want to support this for a > couple > > > minor releases to not force users to lose their cache. If there is no > such > > > difference, it is probably good to gradually migrate them > automatically via > > > opt-in for a couple minor releases. In both cases, we can enforce > migration > > > (neither opt-in nor opt-out) after that period. > > > > > > On Fri, Jun 6, 2025 at 10:51 AM Jia Fan <fanjia1...@gmail.com> wrote: > > > > > >> This is indeed an issue at the moment. Personally, I haven't found a > > >> proper way to migrate data from LevelDB to RocksDB, as their storage > > >> structures are different. Should we wait until a reasonable migration > > >> solution becomes available before moving forward with this? > > >> > > >> Jungtaek Lim <kabhwan.opensou...@gmail.com> 于2025年5月28日周三 15:41写道: > > >> > > > >> > Thanks for initiating this. > > >> > > > >> > I wonder if we don't have any compatibility issue on every > component - > > >> SS area does not have an issue, but I don't quite remember if the > history > > >> server would be OK with this. What is the story of the migration if > they > > >> had been using leveldb? I guess it could be probably re-parsed, but > do we > > >> need to ask users to perform some manual work to do that? > > >> > > > >> > On Wed, May 28, 2025 at 2:27 PM Yang Jie <yangji...@apache.org> > wrote: > > >> >> > > >> >> The project "org.fusesource.leveldbjni:leveldbjni" released its > last > > >> version 12 years ago, and its code repository was last updated 8 > years ago. > > >> Consequently, I believe it's challenging for us to receive ongoing > > >> maintenance and support from this project. > > >> >> > > >> >> On the flip side, when developers implement new features related to > > >> Spark code, they have become accustomed to using rocksdb instead of > leveldb. > > >> >> > > >> >> Furthermore, in Spark 4.0, support for leveldb was deprecated, and > the > > >> default implementation of the corresponding functionality was > switched to > > >> rocksdb. > > >> >> > > >> >> Given these factors, I support discontinuing support for leveldb. > > >> >> > > >> >> > > >> >> Thanks > > >> >> JIe Yang > > >> >> > > >> >> On 2025/05/27 08:26:06 Jia Fan wrote: > > >> >> > Hi all, > > >> >> > > > >> >> > I'd like to start a discussion about removing LevelDB support > from > > >> Apache Spark. > > >> >> > > > >> >> > As noted in SPARK-44223( > > >> https://issues.apache.org/jira/browse/SPARK-44223), > > >> >> > LevelDB support was deprecated in Spark 4.0. It’s no longer > actively > > >> >> > maintained or widely used, and continuing to support it brings > > >> >> > unnecessary maintenance and dependency complexity. > > >> >> > > > >> >> > A PR has been opened here to remove it entirely: > > >> >> > https://github.com/apache/spark/pull/51027 > > >> >> > > > >> >> > WDYT? > > >> >> > > > >> >> > Best regards, > > >> >> > Jia Fan > > >> >> > > > >> >> > > --------------------------------------------------------------------- > > >> >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > >> >> > > > >> >> > > > >> >> > > >> >> > --------------------------------------------------------------------- > > >> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > >> >> > > >> > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >