28490 and the performance of Hive 4.0.1 on MR3 1.12 (vs Trino 453)

Okumin Fri, 29 Nov 2024 20:18:58 -0800

Hi Sungwoo,

Thanks for reporting the impact of those patches. I'm happy to see the
decreasing number of seconds.


This is a great thread to learn the mechanism of
`tez.runtime.pipelined-shuffle.enabled`. I also want to withdraw the
mention of priority. If a user is willing to use it, it has use cases.
I'd be excited if we found a good architecture to keep Hive's
excellent pull-based shuffle. I think it's well-designed and
well-implemented.

As for No. 29, it's interesting. We also want to check why when we
have a chance.

Thanks,
Okumin

On Fri, Nov 29, 2024 at 11:56 AM Sungwoo Park <glap...@gmail.com> wrote:
>
> Hello,
>
>> We've merged all three pull requests. Thanks for your contributions.
>
>
> The updated version of HIVE-28489 additionally reduces the total running time 
> of 10TB TPC-DS by about 100 seconds. So, the total running time now decreases 
> from around 5700s to 5200s. Considering the maturity of the Hive compiler 
> today, I would say this is a significant improvement in performance. Thank 
> you for merging the patches.
>
>>
>> > 1. The query plan is identical, but Trino is much faster. This is due to 
>> > the architectural difference between Trino and Hive (on shuffle-intensive 
>> > queries): Trino is based on MPP and thus uses the push model, while Hive 
>> > uses the pull model. There is not much we can do about this type of 
>> > queries. (Note that the push model has its own drawbacks and thus does not 
>> > always win over the pull model. That's why Trino is much slower than Hive 
>> > on many queries.)
>>
>> If we'd like to accelerate those queries, we may be able to enhance
>> `tez.runtime.pipelined-shuffle.enabled`. I've not used this feature,
>> and IMO the priority is lower considering the use case of Apache Hive.
>
>
> Setting tez.runtime.pipelined-shuffle.enabled to true can be useful for a 
> particular type of queries, but it produces only a small speedup on average 
> (no more than 4 percent) when tested with 10TB TPC-DS. Besides, it has its 
> own drawback that it cannot be used with speculative execution (which is 
> important for dealing with occasional fetch delays).
>
> Pipelined shuffling in Tez is different from pipelined shuffling in Trino 
> which never writes to local disks. In Tez, the output of mappers are always 
> written to local disks regardless of pipelined shuffling. It's just that the 
> output can be written incrementally in separate chunks, while each chunk can 
> be shuffled to downstream tasks as soon as it is created.
>
> So, this problem can be solved only at the level of the execution engine. For 
> me, it's a high-priority problem because Hive in LLAP mode comes quite close 
> to Trino in performance and is already a strong contender as an interactive 
> query engine.
>
>>
>> > 2. Trino generates a query plan that is clearly more efficient than Hive. 
>> > We made some attempt to find a solution in Hive, but came to a preliminary 
>> > conclusion that this would require a significant change in the query 
>> > compiler (e.g., if the decision made later during query compilation is 
>> > inconsistent with an earlier assumption, retry with a different assumption 
>> > until consistency is reached).
>>
>> Interesting. I would like to know more details about those points. We
>> can help with that part, as I am also involved with Trino.
>
>
> An example is query 29 of TPC-DS. Trino chooses MapJoin while Hive chooses 
> DynamicPartionedHashJoin, and join ordering is very different:
>
> Trino: ((ss + sr) + cs) + i + s
> Hive: (cs+ sr) + (ss + i) + s
>
>  As a result, Hive shuffles a lot of intermediate data and runs much slower 
> than Trino (e.g., 104 seconds vs 15 seconds on 10TB TPC-DS). Before trying to 
> find a fix in Hive, I would like to understand why Trino chooses a simple yet 
> efficient query plan for query 29. I am not actively working on this problem 
> at the moment, and will create a JIRA issue when I make some progress. Thanks.
>
> Regards,
>
> --- Sungwoo Park
>
>
>

Re: HIVE-28488/28489/28490 and the performance of Hive 4.0.1 on MR3 1.12 (vs Trino 453)

Reply via email to