Hi Hive users, Before the impending release of MR3 1.8, we would like to announce the release of Hive-MR3 with Celeborn (Hive 3.1.3 on MR3 1.8 with Celeborn 0.3.1).
Apache Celeborn [1] is remote shuffle service, similar to Magnet [2] and Apache Uniffle [3] (which was discussed in this Hive mailing list a while ago). Celeborn officially supports Spark and Flink, and we have implemented an MR3-extension for Celeborn. In addition to all the benefits of using remote shuffle service, Hive-MR3-Celeborn supports direct processing of mapper output on the reducer side, which means that reducers do not store mapper output on local disks (for unordered edges). In this way, Hive-MR3-Celeborn can eliminate over 95% of local disk writes when tested on the 10TB TPC-DS benchmark. This can be particularly useful when running Hive-MR3 on public clouds where fast local disk storage is expensive or not available. We have documented the usage of Hive-MR3-Celeborn in [4]. You can download Hive-MR3-Celeborn in [5]. FYI, MR3 is an execution engine providing native support for Hadoop, Kubernetes, and standalone mode [6]. Hive-MR3, its main application, provides the performance of LLAP yet is very easy to install and operate. If you are using Hive-Tez for running ETL jobs, switching to Hive-MR3 will give you a much higher throughput thanks to its advanced resource sharing model. We have recently opened a Slack channel. If interested, please join the Slack channel and ask any question on MR3: https://join.slack.com/t/mr3-help/shared_invite/zt-1wpqztk35-AN8JRDznTkvxFIjtvhmiNg Thank you, --- Sungwoo [1] https://celeborn.apache.org/ [2] https://www.vldb.org/pvldb/vol13/p3382-shen.pdf [3] https://uniffle.apache.org/ [4] https://mr3docs.datamonad.com/docs/mr3/features/celeborn/ [5] https://github.com/mr3project/mr3-release/releases/tag/v1.8 [6] https://mr3docs.datamonad.com/