Celeborn and Uniffle can also be seen as a move to separate local storage from compute nodes.
1. In the old days, Hadoop was based on the idea of collocating compute and storage. 2. Later a new paradigm of separating compute and storage emerged and got popularized. 3. Now people want to not just separate compute and storage, but also separate local storage from compute nodes. In the future, all of shuffle/spill files might be stored in a dedicated system like Celeborn and Uniffle. In our case of developing Hive-MR3, we completely removed spill files for unordered edges thanks to the efficient buffering in Celeborn. Thanks, --- Sungwoo On Thu, Nov 2, 2023 at 7:31 PM Keyong Zhou <zho...@apache.org> wrote: > I think both Celeborn and Uniffle are good alternatives as a general > shuffle service. > I recommend that you try them : ). For any question about Celeborn, we're > very glad > to discuss in Celeborn's mail lists[1][2] or slack[3]. > > [1] u...@celeborn.apache.org > [2] d...@celeborn.apache.org > [3] > https://join.slack.com/t/apachecelebor-kw08030/shared_invite/zt-1ju3hd5j8-4Z5keMdzpcVMspe4UJzF4Q > > Thanks, > Keyong Zhou > > On 2023/10/31 14:24:38 "Battula, Brahma Reddy" wrote: > > Thanks for bringing up this. Good to see that it supports spark and > flink. > > > > Have you done comparison between uniffle and celeborn..? > > > > > > On 30/10/23, 8:01 AM, "Keyong Zhou" <zho...@apache.org <mailto: > zho...@apache.org>> wrote: > > > > > > Great to hear this! It's encouraging that Celeborn helps MR3. > > > > > > Celeborn is a general purpose remote shuffle service that stores and > serves > > shuffle data (and other intermediate data in the future) to help compute > engines > > better use disaggregated architecture, as well as become more efficient > and > > stable for huge shuffle sized jobs. > > > > > > Currently Celeborn supports Hive on MR, and I think integrating with MR3 > > provides a good example to support Hive on Tez. > > > > > > Thanks, > > Keyong Zhou > > > > > > On 2023/10/24 12:08:54 Sungwoo Park wrote: > > > Hi Hive users, > > > > > > Before the impending release of MR3 1.8, we would like to announce the > > > release of Hive-MR3 with Celeborn (Hive 3.1.3 on MR3 1.8 with Celeborn > > > 0.3.1). > > > > > > Apache Celeborn [1] is remote shuffle service, similar to Magnet [2] > and > > > Apache Uniffle [3] (which was discussed in this Hive mailing list a > while > > > ago). Celeborn officially supports Spark and Flink, and we have > implemented > > > an MR3-extension for Celeborn. > > > > > > In addition to all the benefits of using remote shuffle service, > > > Hive-MR3-Celeborn supports direct processing of mapper output on the > > > reducer side, which means that reducers do not store mapper output on > local > > > disks (for unordered edges). In this way, Hive-MR3-Celeborn can > eliminate > > > over 95% of local disk writes when tested on the 10TB TPC-DS benchmark. > > > This can be particularly useful when running Hive-MR3 on public clouds > > > where fast local disk storage is expensive or not available. > > > > > > We have documented the usage of Hive-MR3-Celeborn in [4]. You can > download > > > Hive-MR3-Celeborn in [5]. > > > > > > FYI, MR3 is an execution engine providing native support for Hadoop, > > > Kubernetes, and standalone mode [6]. Hive-MR3, its main application, > > > provides the performance of LLAP yet is very easy to install and > operate. > > > If you are using Hive-Tez for running ETL jobs, switching to Hive-MR3 > will > > > give you a much higher throughput thanks to its advanced resource > sharing > > > model. > > > > > > We have recently opened a Slack channel. If interested, please join the > > > Slack channel and ask any question on MR3: > > > > > > > https://join.slack.com/t/mr3-help/shared_invite/zt-1wpqztk35-AN8JRDznTkvxFIjtvhmiNg > < > https://join.slack.com/t/mr3-help/shared_invite/zt-1wpqztk35-AN8JRDznTkvxFIjtvhmiNg > > > > > > > > Thank you, > > > > > > --- Sungwoo > > > > > > [1] https://celeborn.apache.org/ <https://celeborn.apache.org/> > > > [2] https://www.vldb.org/pvldb/vol13/p3382-shen.pdf < > https://www.vldb.org/pvldb/vol13/p3382-shen.pdf> > > > [3] https://uniffle.apache.org/ <https://uniffle.apache.org/> > > > [4] https://mr3docs.datamonad.com/docs/mr3/features/celeborn/ < > https://mr3docs.datamonad.com/docs/mr3/features/celeborn/> > > > [5] https://github.com/mr3project/mr3-release/releases/tag/v1.8 < > https://github.com/mr3project/mr3-release/releases/tag/v1.8> > > > [6] https://mr3docs.datamonad.com/ <https://mr3docs.datamonad.com/> > > > > > > > > > > > >