Hi,

We also had to disable CachedStore as it was producing wrong results in our
queries.
I'm sorry I cannot provide more detailed info.

Cheers,

Pau.

Missatge de Takanobu Asanuma <tasan...@apache.org> del dia dc., 28 de febr.
2024 a les 13:59:

> Thanks for your detailed answer!
>
> In the original email, you reported "the query compilation takes long" in
> Hive 3.0, but has this issue been resolved in your fork of Hive 3.1.3?
> Thank you for sharing the issue with CachedStore and the JIRA tickets.
> I will also try out metastore.stats.fetch.bitvector=true.
>
> Regards,
> - Takanobu
>
> 2024年2月28日(水) 18:49 Sungwoo Park <glap...@gmail.com>:
>
>> Hello Takanobu,
>>
>> We did not test with vanilla Hive 3.1.3 and Metastore databases can be
>> different, so I don't know why Metastore responses are very slow. I can
>> only share some results of testing CachedStore in Metastore. Please note
>> that we did not use vanilla Hive 3.1.3 and instead used our own fork of
>> Hive 3.1.3 (which applies many additional patches).
>>
>> 1.
>> When CachedStore is enabled, column stats are not computed. As a result,
>> some queries generate very inefficient plans because of wrong/inaccurate
>> stats.
>>
>> Perhaps this is because not all patches for CachedStore have been merged
>> to Hive 3.1.3. For example, these patches are not merged. Or, there might
>> be some way to properly configure CachedStore so that it correctly computes
>> column stats.
>>
>> HIVE-20896: CachedStore fail to cache stats in multiple code paths
>> HIVE-21063: Support statistics in cachedStore for transactional table
>> HIVE-24258: Data mismatch between CachedStore and ObjectStore for
>> constraint
>>
>> So, we decided that CachedStore should not be enabled in Hive 3.1.3.
>>
>> (If anyone is running Hive Metastore 3.1.3 in production with CachedStore
>> enabled, please let us know how you configure it.)
>>
>> 2.
>> Setting metastore.stats.fetch.bitvector=true can also help generate more
>> efficient query plans.
>>
>> --- Sungwoo
>>
>>
>> On Wed, Feb 28, 2024 at 1:40 PM Takanobu Asanuma <tasan...@apache.org>
>> wrote:
>>
>>> Hi Sungwoo Park,
>>>
>>> I'm sorry for the late reply to this old email.
>>> We are attempting to upgrade Hive MetaStore from Hive1 to Hive3, and
>>> noticed that the response of the Hive3 MetaStore is very slow.
>>> We suspect that HIVE-14187 might be causing this slowness.
>>> Could you tell me if you have resolved this problem? Are there still any
>>> problems when you enable CachedStore?
>>>
>>> Regards,
>>> - Takanobu
>>>
>>> 2018年6月13日(水) 0:37 Sungwoo Park <glap...@gmail.com>:
>>>
>>>> Hello Hive users,
>>>>
>>>> I am experience a problem with MetaStore in Hive 3.0.
>>>>
>>>> 1. Start MetaStore
>>>> with 
>>>> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore.
>>>>
>>>> 2. Generate TPC-DS data.
>>>>
>>>> 3. TPC-DS queries run okay and produce correct results. E.g., from
>>>> query 1:
>>>> +-------------------+
>>>> |   c_customer_id   |
>>>> +-------------------+
>>>> | AAAAAAAAAAAACHAA  |
>>>> | AAAAAAAAAAAADCAA  |
>>>> | AAAAAAAAAAAADDAA  |
>>>> ...
>>>> | AAAAAAAAAAAILIAA  |
>>>> +-------------------+
>>>> 100 rows selected (69.901 seconds)
>>>>
>>>> However, the query compilation takes long (
>>>> https://issues.apache.org/jira/browse/HIVE-16520).
>>>>
>>>> 4. Now, restart MetaStore with
>>>> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.cache.CachedStore.
>>>>
>>>> 5. TPC-DS queries run okay, but produce wrong results. E.g, from query
>>>> 1:
>>>> +----------------+
>>>> | c_customer_id  |
>>>> +----------------+
>>>> +----------------+
>>>> No rows selected (37.448 seconds)
>>>>
>>>> What I noticed is that with hive.metastore.rawstore.impl=CachedStore,
>>>> HiveServer2 produces such log messages:
>>>>
>>>> 2018-06-12T23:50:04,223  WARN [b3041385-0290-492f-aef8-c0249de328ad
>>>> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
>>>> tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
>>>> 2018-06-12T23:50:04,223  INFO [b3041385-0290-492f-aef8-c0249de328ad
>>>> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
>>>> tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
>>>> 2018-06-12T23:50:04,225  WARN [b3041385-0290-492f-aef8-c0249de328ad
>>>> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
>>>> tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
>>>> 2018-06-12T23:50:04,225  INFO [b3041385-0290-492f-aef8-c0249de328ad
>>>> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
>>>> tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
>>>> 2018-06-12T23:50:04,226  WARN [b3041385-0290-492f-aef8-c0249de328ad
>>>> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
>>>> tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk,
>>>> c_customer_id
>>>> 2018-06-12T23:50:04,226  INFO [b3041385-0290-492f-aef8-c0249de328ad
>>>> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
>>>> tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk,
>>>> c_customer_id
>>>>
>>>> 2018-06-12T23:50:05,158 ERROR [b3041385-0290-492f-aef8-c0249de328ad
>>>> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
>>>> Invalid column stats: No of nulls > cardinality
>>>> 2018-06-12T23:50:05,159 ERROR [b3041385-0290-492f-aef8-c0249de328ad
>>>> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
>>>> Invalid column stats: No of nulls > cardinality
>>>> 2018-06-12T23:50:05,160 ERROR [b3041385-0290-492f-aef8-c0249de328ad
>>>> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
>>>> Invalid column stats: No of nulls > cardinality
>>>>
>>>> However, even after computing column stats, queries still return wrong
>>>> results, despite the fact that the above log messages disappear.
>>>>
>>>> I guess I am missing some configuration parameters (because I imported
>>>> hive-site.xml from Hive 2). Any suggestion would be appreciated.
>>>>
>>>> Thanks a lot,
>>>>
>>>> --- Sungwoo Park
>>>>
>>>>

-- 
----------------------------------
Pau Tallada Crespí
Departament de Serveis
Port d'Informació Científica (PIC)
Tel: +34 93 170 2729
----------------------------------

Reply via email to