Hive 4.0.1 on MR3 released

Sungwoo Park Thu, 14 Nov 2024 09:21:41 -0800

Hi everyone,

We have released Hive 4.0.1 on MR3. For its git repository, please see:


https://github.com/mr3project/hive-mr3

For those interested, I would like to give a short introduction of Hive on
MR3 below.

Apache Hive continues to make consistent progress in adding new features
and optimizations. However, its execution engine Tez is currently not
adding new features to adapt to changing environments.

Hive on MR3 replaces Tez with another fault-tolerant execution engine MR3,
and provides additional features that can be implemented only at the layer
of execution engine. Here is a list of such features.

1. You can run Apache Hive directly on Kubernetes (including AWS EKS), by
creating and deleting Kubernetes pods. Compaction and distcp jobs (which
are originally MapReduce jobs) are also executed directly on Kubernetes.
Hive on MR3 on Kubernetes + S3 is a good working combination.

2. You can run Apache Hive without upgrading Hadoop. You can also run
Apache Hive in standalone mode (similarly to Spark standalone mode) without
requiring resource managers like Yarn and Kubernetes. Overall it's very
easy to install and set up Hive on MR3.

3. Unlike in Apache Hive, an instance of DAGAppMaster can manage many
concurrent DAGs. A single high-capacity DAGAppMaster (e.g., with 200+GB of
memory) can handle over a hundred concurrent DAGs without needing to be
restarted.

4. Similarly to LLAP daemons, a worker can execute many concurrent tasks.
These workers are shared across DAGs, so one usually creates large workers
(e.g., with 100+GB of memory) that run like daemons.

5. Hive on MR3 automatically achieves the speed of LLAP without requiring
any further configuration. On TPC-DS workloads, Hive on MR3 is actually
faster than Hive-LLAP. From our latest benchmarking based on 10TB TPC-DS,
Hive on MR3 runs faster than Trino 453.

6. Apache Hive will start to support Java 17 from its 4.1.0 release, but
Hive on MR3 already supports Java 17.

7. Hive on MR3 supports remote shuffle service. Currently we support Apache
Celeborn 0.5.1 with fault tolerance. If you would like to run Hive on
public clouds with a dedicated shuffle service, Hive on MR3 is a ready
solution.

If interested, please check out the quick start guide:

https://mr3docs.datamonad.com/docs/quick/

Thanks,

--- Sungwoo

Hive 4.0.1 on MR3 released

Reply via email to