Re: [DISCUSS] Dropping LevelDB support in Spark

Jungtaek Lim Mon, 09 Jun 2025 05:38:20 -0700

It's a bit different for users leveraging LevelDB - since it requires
opt-in, they are willing to use it if they still use it, hence they are
likely to retain the config during the upgrade.


>From the initial post, there is a claim that we deprecated LevelDB in
Apache Spark 4.0.0. Shall I ask what we did for deprecation, and would you
mind giving me a discussion about it? I just wanted to make sure we are not
making an outstanding change at one minor version upgrade.

On Mon, Jun 9, 2025 at 3:01 PM Yang Jie <yangji...@apache.org> wrote:

> I would like to provide some new information:
>
> 1. Spark 3.4.0 [SPARK-42277] has started using RocksDB as the default
> option for `spark.history.store.hybridStore.diskBackend`.
>
> - Since Spark 3.4, Spark will use RocksDB store if
> `spark.history.store.hybridStore.enabled` is true. To restore the behavior
> before Spark 3.4, you can set `spark.history.store.hybridStore.diskBackend`
> to `LEVELDB`.
>
> 2. Spark 4.0.0 [SPARK-45351] has begun using RocksDB as the default option
> for `spark.shuffle.service.db.backend`.
>
> - Since Spark 4.0, `spark.shuffle.service.db.backend` is set to `ROCKSDB`
> by default which means Spark will use RocksDB store for shuffle service. To
> restore the behavior before Spark 4.0, you can set
> `spark.shuffle.service.db.backend` to `LEVELDB`.
>
> So for users who hadn't explicitly configured the aforementioned options
> to be `LEVELDB` before, the situations of data reconstruction or re-parsing
> have already existed.
>
> On 2025/06/09 01:08:05 Jungtaek Lim wrote:
> > Thanks for the valuable input.
> >
> > I think it's more about the case where upgrading would surprise the end
> > users. If we simply remove LevelDB from the next release, we will be
> > removing these intermediate data as well and enforcing them to rebuild
> > everything. 15 mins is probably not super long from the given volume, but
> > even a couple additional minutes could bring a negative sentiment if they
> > ever opened this before.
> >
> > Would enabling the hybrid store reduce the surprise? If then maybe we
> could
> > ask users to enable it, with assigning a bit more memory (+ 2g on SHS
> > process) if they didn't use the hybrid store.
> >
> > 2025년 6월 6일 (금) 오후 5:08, Cheng Pan <pan3...@gmail.com>님이 작성:
> >
> > > I think SHS only uses LevelDB/RocksDB to store intermediate data,
> > > supporting re-parsing to rebuild the cache should be fine enough.
> > >
> > > Also share my experience about using LevelDB/RocksDB for SHS, it seems
> > > LevelDB has native memory leak issues, at least for the SHS use case, I
> > > need to reboot the SHS for every two months to recover it, issue gone
> after
> > > upgrading to Spark 3.3 and switching to RocksDB.
> > >
> > > Scale and Performance: we keep ~800k applications event logs for the
> event
> > > log HDFS directory, multiple threads re-parsing to rebuild listing.rdb
> > > takes ~15mins.
> > >
> > > Thanks,
> > > Cheng Pan
> > >
> > >
> > >
> > > On Jun 6, 2025, at 15:36, Jungtaek Lim <kabhwan.opensou...@gmail.com>
> > > wrote:
> > >
> > > IMHO, it's probably dependent on how long the rewrite will take, from
> > > reading the event log. If loading the state from LevelDB and rewriting
> to
> > > RocksDB is quite much faster, then we may want to support this for a
> couple
> > > minor releases to not force users to lose their cache. If there is no
> such
> > > difference, it is probably good to gradually migrate them
> automatically via
> > > opt-in for a couple minor releases. In both cases, we can enforce
> migration
> > > (neither opt-in nor opt-out) after that period.
> > >
> > > On Fri, Jun 6, 2025 at 10:51 AM Jia Fan <fanjia1...@gmail.com> wrote:
> > >
> > >> This is indeed an issue at the moment. Personally, I haven't found a
> > >> proper way to migrate data from LevelDB to RocksDB, as their storage
> > >> structures are different. Should we wait until a reasonable migration
> > >> solution becomes available before moving forward with this?
> > >>
> > >> Jungtaek Lim <kabhwan.opensou...@gmail.com> 于2025年5月28日周三 15:41写道：
> > >> >
> > >> > Thanks for initiating this.
> > >> >
> > >> > I wonder if we don't have any compatibility issue on every
> component -
> > >> SS area does not have an issue, but I don't quite remember if the
> history
> > >> server would be OK with this. What is the story of the migration if
> they
> > >> had been using leveldb? I guess it could be probably re-parsed, but
> do we
> > >> need to ask users to perform some manual work to do that?
> > >> >
> > >> > On Wed, May 28, 2025 at 2:27 PM Yang Jie <yangji...@apache.org>
> wrote:
> > >> >>
> > >> >> The project "org.fusesource.leveldbjni:leveldbjni" released its
> last
> > >> version 12 years ago, and its code repository was last updated 8
> years ago.
> > >> Consequently, I believe it's challenging for us to receive ongoing
> > >> maintenance and support from this project.
> > >> >>
> > >> >> On the flip side, when developers implement new features related to
> > >> Spark code, they have become accustomed to using rocksdb instead of
> leveldb.
> > >> >>
> > >> >> Furthermore, in Spark 4.0, support for leveldb was deprecated, and
> the
> > >> default implementation of the corresponding functionality was
> switched to
> > >> rocksdb.
> > >> >>
> > >> >> Given these factors, I support discontinuing support for leveldb.
> > >> >>
> > >> >>
> > >> >> Thanks
> > >> >> JIe Yang
> > >> >>
> > >> >> On 2025/05/27 08:26:06 Jia Fan wrote:
> > >> >> > Hi all,
> > >> >> >
> > >> >> > I'd like to start a discussion about removing LevelDB support
> from
> > >> Apache Spark.
> > >> >> >
> > >> >> > As noted in SPARK-44223(
> > >> https://issues.apache.org/jira/browse/SPARK-44223),
> > >> >> > LevelDB support was deprecated in Spark 4.0. It’s no longer
> actively
> > >> >> > maintained or widely used, and continuing to support it brings
> > >> >> > unnecessary maintenance and dependency complexity.
> > >> >> >
> > >> >> > A PR has been opened here to remove it entirely:
> > >> >> > https://github.com/apache/spark/pull/51027
> > >> >> >
> > >> >> > WDYT?
> > >> >> >
> > >> >> > Best regards,
> > >> >> > Jia Fan
> > >> >> >
> > >> >> >
> ---------------------------------------------------------------------
> > >> >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > >> >> >
> > >> >> >
> > >> >>
> > >> >>
> ---------------------------------------------------------------------
> > >> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > >> >>
> > >>
> > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [DISCUSS] Dropping LevelDB support in Spark

Reply via email to