Thanks for the valuable input. I think it's more about the case where upgrading would surprise the end users. If we simply remove LevelDB from the next release, we will be removing these intermediate data as well and enforcing them to rebuild everything. 15 mins is probably not super long from the given volume, but even a couple additional minutes could bring a negative sentiment if they ever opened this before.
Would enabling the hybrid store reduce the surprise? If then maybe we could ask users to enable it, with assigning a bit more memory (+ 2g on SHS process) if they didn't use the hybrid store. 2025년 6월 6일 (금) 오후 5:08, Cheng Pan <pan3...@gmail.com>님이 작성: > I think SHS only uses LevelDB/RocksDB to store intermediate data, > supporting re-parsing to rebuild the cache should be fine enough. > > Also share my experience about using LevelDB/RocksDB for SHS, it seems > LevelDB has native memory leak issues, at least for the SHS use case, I > need to reboot the SHS for every two months to recover it, issue gone after > upgrading to Spark 3.3 and switching to RocksDB. > > Scale and Performance: we keep ~800k applications event logs for the event > log HDFS directory, multiple threads re-parsing to rebuild listing.rdb > takes ~15mins. > > Thanks, > Cheng Pan > > > > On Jun 6, 2025, at 15:36, Jungtaek Lim <kabhwan.opensou...@gmail.com> > wrote: > > IMHO, it's probably dependent on how long the rewrite will take, from > reading the event log. If loading the state from LevelDB and rewriting to > RocksDB is quite much faster, then we may want to support this for a couple > minor releases to not force users to lose their cache. If there is no such > difference, it is probably good to gradually migrate them automatically via > opt-in for a couple minor releases. In both cases, we can enforce migration > (neither opt-in nor opt-out) after that period. > > On Fri, Jun 6, 2025 at 10:51 AM Jia Fan <fanjia1...@gmail.com> wrote: > >> This is indeed an issue at the moment. Personally, I haven't found a >> proper way to migrate data from LevelDB to RocksDB, as their storage >> structures are different. Should we wait until a reasonable migration >> solution becomes available before moving forward with this? >> >> Jungtaek Lim <kabhwan.opensou...@gmail.com> 于2025年5月28日周三 15:41写道: >> > >> > Thanks for initiating this. >> > >> > I wonder if we don't have any compatibility issue on every component - >> SS area does not have an issue, but I don't quite remember if the history >> server would be OK with this. What is the story of the migration if they >> had been using leveldb? I guess it could be probably re-parsed, but do we >> need to ask users to perform some manual work to do that? >> > >> > On Wed, May 28, 2025 at 2:27 PM Yang Jie <yangji...@apache.org> wrote: >> >> >> >> The project "org.fusesource.leveldbjni:leveldbjni" released its last >> version 12 years ago, and its code repository was last updated 8 years ago. >> Consequently, I believe it's challenging for us to receive ongoing >> maintenance and support from this project. >> >> >> >> On the flip side, when developers implement new features related to >> Spark code, they have become accustomed to using rocksdb instead of leveldb. >> >> >> >> Furthermore, in Spark 4.0, support for leveldb was deprecated, and the >> default implementation of the corresponding functionality was switched to >> rocksdb. >> >> >> >> Given these factors, I support discontinuing support for leveldb. >> >> >> >> >> >> Thanks >> >> JIe Yang >> >> >> >> On 2025/05/27 08:26:06 Jia Fan wrote: >> >> > Hi all, >> >> > >> >> > I'd like to start a discussion about removing LevelDB support from >> Apache Spark. >> >> > >> >> > As noted in SPARK-44223( >> https://issues.apache.org/jira/browse/SPARK-44223), >> >> > LevelDB support was deprecated in Spark 4.0. It’s no longer actively >> >> > maintained or widely used, and continuing to support it brings >> >> > unnecessary maintenance and dependency complexity. >> >> > >> >> > A PR has been opened here to remove it entirely: >> >> > https://github.com/apache/spark/pull/51027 >> >> > >> >> > WDYT? >> >> > >> >> > Best regards, >> >> > Jia Fan >> >> > >> >> > --------------------------------------------------------------------- >> >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> > >> >> > >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> >> > >