I think SHS only uses LevelDB/RocksDB to store intermediate data, supporting 
re-parsing to rebuild the cache should be fine enough.

Also share my experience about using LevelDB/RocksDB for SHS, it seems LevelDB 
has native memory leak issues, at least for the SHS use case, I need to reboot 
the SHS for every two months to recover it, issue gone after upgrading to Spark 
3.3 and switching to RocksDB.

Scale and Performance: we keep ~800k applications event logs for the event log 
HDFS directory, multiple threads re-parsing to rebuild listing.rdb takes 
~15mins.

Thanks,
Cheng Pan



> On Jun 6, 2025, at 15:36, Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote:
> 
> IMHO, it's probably dependent on how long the rewrite will take, from reading 
> the event log. If loading the state from LevelDB and rewriting to RocksDB is 
> quite much faster, then we may want to support this for a couple minor 
> releases to not force users to lose their cache. If there is no such 
> difference, it is probably good to gradually migrate them automatically via 
> opt-in for a couple minor releases. In both cases, we can enforce migration 
> (neither opt-in nor opt-out) after that period.
> 
> On Fri, Jun 6, 2025 at 10:51 AM Jia Fan <fanjia1...@gmail.com 
> <mailto:fanjia1...@gmail.com>> wrote:
>> This is indeed an issue at the moment. Personally, I haven't found a
>> proper way to migrate data from LevelDB to RocksDB, as their storage
>> structures are different. Should we wait until a reasonable migration
>> solution becomes available before moving forward with this?
>> 
>> Jungtaek Lim <kabhwan.opensou...@gmail.com 
>> <mailto:kabhwan.opensou...@gmail.com>> 于2025年5月28日周三 15:41写道:
>> >
>> > Thanks for initiating this.
>> >
>> > I wonder if we don't have any compatibility issue on every component - SS 
>> > area does not have an issue, but I don't quite remember if the history 
>> > server would be OK with this. What is the story of the migration if they 
>> > had been using leveldb? I guess it could be probably re-parsed, but do we 
>> > need to ask users to perform some manual work to do that?
>> >
>> > On Wed, May 28, 2025 at 2:27 PM Yang Jie <yangji...@apache.org 
>> > <mailto:yangji...@apache.org>> wrote:
>> >>
>> >> The project "org.fusesource.leveldbjni:leveldbjni" released its last 
>> >> version 12 years ago, and its code repository was last updated 8 years 
>> >> ago. Consequently, I believe it's challenging for us to receive ongoing 
>> >> maintenance and support from this project.
>> >>
>> >> On the flip side, when developers implement new features related to Spark 
>> >> code, they have become accustomed to using rocksdb instead of leveldb.
>> >>
>> >> Furthermore, in Spark 4.0, support for leveldb was deprecated, and the 
>> >> default implementation of the corresponding functionality was switched to 
>> >> rocksdb.
>> >>
>> >> Given these factors, I support discontinuing support for leveldb.
>> >>
>> >>
>> >> Thanks
>> >> JIe Yang
>> >>
>> >> On 2025/05/27 08:26:06 Jia Fan wrote:
>> >> > Hi all,
>> >> >
>> >> > I'd like to start a discussion about removing LevelDB support from 
>> >> > Apache Spark.
>> >> >
>> >> > As noted in 
>> >> > SPARK-44223(https://issues.apache.org/jira/browse/SPARK-44223),
>> >> > LevelDB support was deprecated in Spark 4.0. It’s no longer actively
>> >> > maintained or widely used, and continuing to support it brings
>> >> > unnecessary maintenance and dependency complexity.
>> >> >
>> >> > A PR has been opened here to remove it entirely:
>> >> > https://github.com/apache/spark/pull/51027
>> >> >
>> >> > WDYT?
>> >> >
>> >> > Best regards,
>> >> > Jia Fan
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
>> >> > <mailto:dev-unsubscr...@spark.apache.org>
>> >> >
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
>> >> <mailto:dev-unsubscr...@spark.apache.org>
>> >>

Reply via email to