Re: [DISCUSS] Dropping LevelDB support in Spark

2025-06-09 Thread Jungtaek Lim
It's a bit different for users leveraging LevelDB - since it requires opt-in, they are willing to use it if they still use it, hence they are likely to retain the config during the upgrade. >From the initial post, there is a claim that we deprecated LevelDB in Apache Spark 4.0.0. Shall I ask what

Re: [DISCUSS] Dropping LevelDB support in Spark

2025-06-08 Thread Yang Jie
I would like to provide some new information: 1. Spark 3.4.0 [SPARK-42277] has started using RocksDB as the default option for `spark.history.store.hybridStore.diskBackend`. - Since Spark 3.4, Spark will use RocksDB store if `spark.history.store.hybridStore.enabled` is true. To restore the beha

Re: [DISCUSS] Dropping LevelDB support in Spark

2025-06-08 Thread Jungtaek Lim
Thanks for the valuable input. I think it's more about the case where upgrading would surprise the end users. If we simply remove LevelDB from the next release, we will be removing these intermediate data as well and enforcing them to rebuild everything. 15 mins is probably not super long from the

Re: [DISCUSS] Dropping LevelDB support in Spark

2025-06-06 Thread Cheng Pan
I think SHS only uses LevelDB/RocksDB to store intermediate data, supporting re-parsing to rebuild the cache should be fine enough. Also share my experience about using LevelDB/RocksDB for SHS, it seems LevelDB has native memory leak issues, at least for the SHS use case, I need to reboot the S

Re: [DISCUSS] Dropping LevelDB support in Spark

2025-06-06 Thread Jungtaek Lim
IMHO, it's probably dependent on how long the rewrite will take, from reading the event log. If loading the state from LevelDB and rewriting to RocksDB is quite much faster, then we may want to support this for a couple minor releases to not force users to lose their cache. If there is no such diff

Re: [DISCUSS] Dropping LevelDB support in Spark

2025-06-05 Thread Jia Fan
This is indeed an issue at the moment. Personally, I haven't found a proper way to migrate data from LevelDB to RocksDB, as their storage structures are different. Should we wait until a reasonable migration solution becomes available before moving forward with this? Jungtaek Lim 于2025年5月28日周三 15

Re: [DISCUSS] Dropping LevelDB support in Spark

2025-05-28 Thread Jungtaek Lim
Thanks for initiating this. I wonder if we don't have any compatibility issue on every component - SS area does not have an issue, but I don't quite remember if the history server would be OK with this. What is the story of the migration if they had been using leveldb? I guess it could be probably

Re: [DISCUSS] Dropping LevelDB support in Spark

2025-05-27 Thread Yang Jie
The project "org.fusesource.leveldbjni:leveldbjni" released its last version 12 years ago, and its code repository was last updated 8 years ago. Consequently, I believe it's challenging for us to receive ongoing maintenance and support from this project. On the flip side, when developers implem

[DISCUSS] Dropping LevelDB support in Spark

2025-05-27 Thread Jia Fan
Hi all, I'd like to start a discussion about removing LevelDB support from Apache Spark. As noted in SPARK-44223(https://issues.apache.org/jira/browse/SPARK-44223), LevelDB support was deprecated in Spark 4.0. It’s no longer actively maintained or widely used, and continuing to support it brings