Announce: Hive-MR3 with Celeborn,

Sungwoo Park Tue, 24 Oct 2023 05:10:22 -0700

Hi Hive users,

Before the impending release of MR3 1.8, we would like to announce the
release of Hive-MR3 with Celeborn (Hive 3.1.3 on MR3 1.8 with Celeborn
0.3.1).


Apache Celeborn [1] is remote shuffle service, similar to Magnet [2] and
Apache Uniffle [3] (which was discussed in this Hive mailing list a while
ago). Celeborn officially supports Spark and Flink, and we have implemented
an MR3-extension for Celeborn.

In addition to all the benefits of using remote shuffle service,
Hive-MR3-Celeborn supports direct processing of mapper output on the
reducer side, which means that reducers do not store mapper output on local
disks (for unordered edges). In this way, Hive-MR3-Celeborn can eliminate
over 95% of local disk writes when tested on the 10TB TPC-DS benchmark.
This can be particularly useful when running Hive-MR3 on public clouds
where fast local disk storage is expensive or not available.

We have documented the usage of Hive-MR3-Celeborn in [4]. You can download
Hive-MR3-Celeborn in [5].

FYI, MR3 is an execution engine providing native support for Hadoop,
Kubernetes, and standalone mode [6]. Hive-MR3, its main application,
provides the performance of LLAP yet is very easy to install and operate.
If you are using Hive-Tez for running ETL jobs, switching to Hive-MR3 will
give you a much higher throughput thanks to its advanced resource sharing
model.

We have recently opened a Slack channel. If interested, please join the
Slack channel and ask any question on MR3:

https://join.slack.com/t/mr3-help/shared_invite/zt-1wpqztk35-AN8JRDznTkvxFIjtvhmiNg

Thank you,

--- Sungwoo

[1] https://celeborn.apache.org/
[2] https://www.vldb.org/pvldb/vol13/p3382-shen.pdf
[3] https://uniffle.apache.org/
[4] https://mr3docs.datamonad.com/docs/mr3/features/celeborn/
[5] https://github.com/mr3project/mr3-release/releases/tag/v1.8
[6] https://mr3docs.datamonad.com/

Announce: Hive-MR3 with Celeborn,

Reply via email to