Re:Re: Hive's performance for querying the Iceberg table is very poor.

lisoda Tue, 24 Oct 2023 02:22:35 -0700

Thanks.
I would like to know if hive currently supports push to ICEBERG table partition 
under JOIN condition.
Because I see HIVE-27734 is not yet complete, what is its progress so far?
Also, after the release of 4.0.0, will we be able to do all TPCDS queries on 
ICEBERG except for normal HIVE tables?












在 2023-10-24 11:03:07，"Ayush Saxena" <[email protected]> 写道：

Hi Lisoda,


The iceberg jar for hive 3.1.3 doesn't have a lot of changes, We did a bunch of 
improvements on the 4.x line for Hive-Iceberg. You can give iceberg a try on 
the 4.0.0-beta-1 release mentioned here [1], we have a bunch of improvements 
like vecotrization and stuff like that. If you wanna give it a quick try on 
docker, we have docker image published for that here [2] & Iceberg works out of 
the box there.


Rest feel free to create tickets, if you find some specific queries or 
scenarios which are problematic, we will be happy to chase them & get them 
sorted.


PS. Not sure about StarRocks, FWIW. That is something we don't develop as part 
of Apache Hive nor as part of Apache Software Foundation to best of my 
knowledge, so would refrain from or commenting about that on "Apache Hive" ML


-Ayush




[1] https://hive.apache.org/general/downloads/
[2] https://hub.docker.com/r/apache/hive/tags


On Tue, 24 Oct 2023 at 05:28, Albert Wong <[email protected]> wrote:

Too bad.   Tencent Games used StarRocks with Apache Iceberg to power their 
analytics.   
https://medium.com/starrocks-engineering/tencent-games-inside-scoop-the-road-to-cloud-native-with-starrocks-d7dcb2438e25.
   


On Mon, Oct 23, 2023 at 10:55 AM lisoda <[email protected]> wrote:

We are not going to use starrocks.
mpp architecture databases have natural limitations, and starrocks does not 
necessarily perform better than hive llap.



---- Replied Message ----
| From | Albert Wong<[email protected]> |
| Date | 10/24/2023 01:39 |
| To | [email protected] |
| Cc | |
| Subject | Re: Hive's performance for querying the Iceberg table is very poor. 
|
I would try http://starrocks.io.   StarRocks is an MPP OLAP database that can 
query Apache Iceberg and we can cache the data for faster performance.  We also 
have additional features like building materialized views that span across 
Apache Iceberg, Apache Hudi and Apache Hive.   Here is a video of connecting 
the 2 products through a webinar StarRocks did with Tabular (authors of Apache 
Iceberg).  https://www.youtube.com/watch?v=bAmcTrX7hCI&t=10s


On Mon, Oct 23, 2023 at 7:18 AM lisoda <[email protected]> wrote:

Hi Team.
      I recently was testing Hive query Iceberg table , I found that Hive query 
Iceberg table performance is very very poor . Almost impossible to use in the 
production environment . And Join conditions can not be pushed down to the 
Iceberg partition.
      I'm using the 1.3.1 Hive Runtime Jar from the Iceberg community.
      Currently I'm using Hive 3.1.3, Iceberg 1.3.1. 
      Now I'm very frustrated because the performance is so bad that I can't 
deliver to my customers. How can I solve this problem?
     Details:  
https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1695050248606629
    I would be grateful if someone could guide me.

Re:Re: Hive's performance for querying the Iceberg table is very poor.

Reply via email to