I think both Celeborn and Uniffle are good alternatives as a general shuffle service. I recommend that you try them : ). For any question about Celeborn, we're very glad to discuss in Celeborn's mail lists[1][2] or slack[3].
[1] u...@celeborn.apache.org [2] d...@celeborn.apache.org [3] https://join.slack.com/t/apachecelebor-kw08030/shared_invite/zt-1ju3hd5j8-4Z5keMdzpcVMspe4UJzF4Q Thanks, Keyong Zhou On 2023/10/31 14:24:38 "Battula, Brahma Reddy" wrote: > Thanks for bringing up this. Good to see that it supports spark and flink. > > Have you done comparison between uniffle and celeborn..? > > > On 30/10/23, 8:01 AM, "Keyong Zhou" <zho...@apache.org > <mailto:zho...@apache.org>> wrote: > > > Great to hear this! It's encouraging that Celeborn helps MR3. > > > Celeborn is a general purpose remote shuffle service that stores and serves > shuffle data (and other intermediate data in the future) to help compute > engines > better use disaggregated architecture, as well as become more efficient and > stable for huge shuffle sized jobs. > > > Currently Celeborn supports Hive on MR, and I think integrating with MR3 > provides a good example to support Hive on Tez. > > > Thanks, > Keyong Zhou > > > On 2023/10/24 12:08:54 Sungwoo Park wrote: > > Hi Hive users, > > > > Before the impending release of MR3 1.8, we would like to announce the > > release of Hive-MR3 with Celeborn (Hive 3.1.3 on MR3 1.8 with Celeborn > > 0.3.1). > > > > Apache Celeborn [1] is remote shuffle service, similar to Magnet [2] and > > Apache Uniffle [3] (which was discussed in this Hive mailing list a while > > ago). Celeborn officially supports Spark and Flink, and we have implemented > > an MR3-extension for Celeborn. > > > > In addition to all the benefits of using remote shuffle service, > > Hive-MR3-Celeborn supports direct processing of mapper output on the > > reducer side, which means that reducers do not store mapper output on local > > disks (for unordered edges). In this way, Hive-MR3-Celeborn can eliminate > > over 95% of local disk writes when tested on the 10TB TPC-DS benchmark. > > This can be particularly useful when running Hive-MR3 on public clouds > > where fast local disk storage is expensive or not available. > > > > We have documented the usage of Hive-MR3-Celeborn in [4]. You can download > > Hive-MR3-Celeborn in [5]. > > > > FYI, MR3 is an execution engine providing native support for Hadoop, > > Kubernetes, and standalone mode [6]. Hive-MR3, its main application, > > provides the performance of LLAP yet is very easy to install and operate. > > If you are using Hive-Tez for running ETL jobs, switching to Hive-MR3 will > > give you a much higher throughput thanks to its advanced resource sharing > > model. > > > > We have recently opened a Slack channel. If interested, please join the > > Slack channel and ask any question on MR3: > > > > https://join.slack.com/t/mr3-help/shared_invite/zt-1wpqztk35-AN8JRDznTkvxFIjtvhmiNg > > > > <https://join.slack.com/t/mr3-help/shared_invite/zt-1wpqztk35-AN8JRDznTkvxFIjtvhmiNg> > > > > Thank you, > > > > --- Sungwoo > > > > [1] https://celeborn.apache.org/ <https://celeborn.apache.org/> > > [2] https://www.vldb.org/pvldb/vol13/p3382-shen.pdf > > <https://www.vldb.org/pvldb/vol13/p3382-shen.pdf> > > [3] https://uniffle.apache.org/ <https://uniffle.apache.org/> > > [4] https://mr3docs.datamonad.com/docs/mr3/features/celeborn/ > > <https://mr3docs.datamonad.com/docs/mr3/features/celeborn/> > > [5] https://github.com/mr3project/mr3-release/releases/tag/v1.8 > > <https://github.com/mr3project/mr3-release/releases/tag/v1.8> > > [6] https://mr3docs.datamonad.com/ <https://mr3docs.datamonad.com/> > > > > > >