Please ensure hive.stats.autogather is enabled as well.
On Fri, Nov 10, 2023, 2:57 PM Denys Kuzmenko wrote:
> `hive.iceberg.stats.source` controls where the stats should be sourced
> from. When it's set to iceberg (default), we should go directly to iceberg
> and bypass HMS.
>
`hive.iceberg.stats.source` controls where the stats should be sourced from.
When it's set to iceberg (default), we should go directly to iceberg and bypass
HMS.
Can you please check this property? We need ensure it is true.
set hive.compute.query.using.stats=true;
In addition, it looks like the table created by spark has lots of data. Can you
create a new table and insert into several values by spark, and then create &
count(*) this location_based_tab
Could you please provide detailed steps to reproduce this issue? e.g. how do
you create the table?
Thanks,
Butao Zhang
Replied Message
| From | lisoda |
| Date | 11/9/2023 14:25 |
| To | |
| Subject | Re:Re: Re: Hive's performance for querying the Iceberg table is
very poor. |
In
Hi lisoda. You can check this ticket
https://issues.apache.org/jira/browse/HIVE-27347 which can use iceberg basic
stats to optimize count(*) query. Note: it didn't take effect if having delete
files.
Thanks,
Butao Zhang
Replied Message
| From | lisoda |
| Date | 11/9/2023 10:43 |
HIVE-27734 is in progress, as I see we have a POC attached to the ticket,
we should have it in 2-3 week I believe.
> Also, after the release of 4.0.0, will we be able to do all TPCDS queries
on ICEBERG except for normal HIVE tables?
Yep, I believe most of the TPCDS queries would be supported even