Re: Hive on Tez vs Impala

Thai Bui Mon, 22 Apr 2019 11:39:49 -0700

I'm using Hive 3.1 on Tez/LLAP and I must say the experience was not good
but it was worth it. We built Hive from HDP's hive-release and add Tez UI
back, combined that with Hue 4.3 (also built from Cloudera Hue). Now that
the two companies have merged I think things are going to get better (I'm
not an enterprise user of either CDH or HDP and we build our own distro
based off their open-source version). Hue is now trying to integrate with
Atlas and Ranger which is a really good step.

We like Tez because it has been stable enough for batch processing jobs.
The LLAP and vectorized side of things is a different story and that's
where the new Hive is going to be. However, historically it hasn't been
that stable as much as pure Tez containers in our opinion. LLAP +
vectorized execution can bring the speed to sub-seconds if you have the
hardware for it (at least 128G of mem instance with a good 10Gbit network,
i3.4xlarge on AWS for example). It's actually faster than Presto (in our
case AWS Athena as well) in a few cases however I would say they are very
comparable.

I like the fact that we can use a single SQL dialect (for both batch and
interactive queries) using a combination of Hive 3.x on Tez and Hive 3.1 on
LLAP. There's no context switching between different dialect wasting our
time in LATERAL VIEW explode(..) vs. CROSS JOIN unnest(...).

One thing I must say though, Hive 3 has a few backwards-incompatible
changes so be careful. For example, the transition of the managed table to
a default transactional table has broken many of our assumptions. I wish
the Hive team to keep things more backward-compatible as well. Hive is such
an enormous system with a wide-spread impact so any backward-incompatible
change could cause an uproar in the community.

On Tue, Apr 16, 2019 at 8:08 AM Edward Capriolo <edlinuxg...@gmail.com>
wrote:

> I have changes jobs 3 times since tez was introduced. It is a true waste
> of compute resources and time that it was never patched in. So I either
> have to waste my time patching it in, waste my time running a side
> deployment, or not installing it and waste money having queries run longer
> on mr/spark engine.
>
> Imagine how much compute hours have been lost world wide.
> On Tuesday, April 16, 2019, Manoj Murumkar <manoj.murum...@gmail.com>
> wrote:
>
>> If we install our own build of Hive, we'll be out of support from CDH.
>>
>> Tez is not supported anyway and we're not touching any CDH bits, so it's
>> not a big issue to have our own build of Tez engine.
>>
>> > On Apr 15, 2019, at 9:20 PM, Gopal Vijayaraghavan <gop...@apache.org>
>> wrote:
>> >
>> >
>> > Hi,
>> >
>> >>> However, we have built Tez on CDH and it runs just fine.
>> >
>> > Down that path you'll also need to deploy a slightly newer version of
>> Hive as well, because Hive 1.1 is a bit ancient & has known bugs with the
>> tez planner code.
>> >
>> > You effectively end up building the hortonworks/hive-release builds, by
>> undoing the non-htrace tracing impl & applying the htrace one back etc.
>> >
>> >> Lol. I was hoping that the merger would unblock the "saltyness".
>> >
>> > Historically, I've unofficially supported folks using Tez on CDH in
>> prod (assuming they buy me enough coffee), though I might have discontinue
>> that.
>> >
>> >
>> https://github.com/t3rmin4t0r/tez-autobuild/blob/llap/vendor-repos.xml#L11
>> >
>> > Cheers,
>> > Gopal
>> >
>> >
>>
>
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>

-- 
Thai

Re: Hive on Tez vs Impala

Reply via email to