Hi Sungwoo, Thanks for reporting the impact of those patches. I'm happy to see the decreasing number of seconds.
This is a great thread to learn the mechanism of `tez.runtime.pipelined-shuffle.enabled`. I also want to withdraw the mention of priority. If a user is willing to use it, it has use cases. I'd be excited if we found a good architecture to keep Hive's excellent pull-based shuffle. I think it's well-designed and well-implemented. As for No. 29, it's interesting. We also want to check why when we have a chance. Thanks, Okumin On Fri, Nov 29, 2024 at 11:56 AM Sungwoo Park <glap...@gmail.com> wrote: > > Hello, > >> We've merged all three pull requests. Thanks for your contributions. > > > The updated version of HIVE-28489 additionally reduces the total running time > of 10TB TPC-DS by about 100 seconds. So, the total running time now decreases > from around 5700s to 5200s. Considering the maturity of the Hive compiler > today, I would say this is a significant improvement in performance. Thank > you for merging the patches. > >> >> > 1. The query plan is identical, but Trino is much faster. This is due to >> > the architectural difference between Trino and Hive (on shuffle-intensive >> > queries): Trino is based on MPP and thus uses the push model, while Hive >> > uses the pull model. There is not much we can do about this type of >> > queries. (Note that the push model has its own drawbacks and thus does not >> > always win over the pull model. That's why Trino is much slower than Hive >> > on many queries.) >> >> If we'd like to accelerate those queries, we may be able to enhance >> `tez.runtime.pipelined-shuffle.enabled`. I've not used this feature, >> and IMO the priority is lower considering the use case of Apache Hive. > > > Setting tez.runtime.pipelined-shuffle.enabled to true can be useful for a > particular type of queries, but it produces only a small speedup on average > (no more than 4 percent) when tested with 10TB TPC-DS. Besides, it has its > own drawback that it cannot be used with speculative execution (which is > important for dealing with occasional fetch delays). > > Pipelined shuffling in Tez is different from pipelined shuffling in Trino > which never writes to local disks. In Tez, the output of mappers are always > written to local disks regardless of pipelined shuffling. It's just that the > output can be written incrementally in separate chunks, while each chunk can > be shuffled to downstream tasks as soon as it is created. > > So, this problem can be solved only at the level of the execution engine. For > me, it's a high-priority problem because Hive in LLAP mode comes quite close > to Trino in performance and is already a strong contender as an interactive > query engine. > >> >> > 2. Trino generates a query plan that is clearly more efficient than Hive. >> > We made some attempt to find a solution in Hive, but came to a preliminary >> > conclusion that this would require a significant change in the query >> > compiler (e.g., if the decision made later during query compilation is >> > inconsistent with an earlier assumption, retry with a different assumption >> > until consistency is reached). >> >> Interesting. I would like to know more details about those points. We >> can help with that part, as I am also involved with Trino. > > > An example is query 29 of TPC-DS. Trino chooses MapJoin while Hive chooses > DynamicPartionedHashJoin, and join ordering is very different: > > Trino: ((ss + sr) + cs) + i + s > Hive: (cs+ sr) + (ss + i) + s > > As a result, Hive shuffles a lot of intermediate data and runs much slower > than Trino (e.g., 104 seconds vs 15 seconds on 10TB TPC-DS). Before trying to > find a fix in Hive, I would like to understand why Trino chooses a simple yet > efficient query plan for query 29. I am not actively working on this problem > at the moment, and will create a JIRA issue when I make some progress. Thanks. > > Regards, > > --- Sungwoo Park > > >