Hi, We also had to disable CachedStore as it was producing wrong results in our queries. I'm sorry I cannot provide more detailed info.
Cheers, Pau. Missatge de Takanobu Asanuma <tasan...@apache.org> del dia dc., 28 de febr. 2024 a les 13:59: > Thanks for your detailed answer! > > In the original email, you reported "the query compilation takes long" in > Hive 3.0, but has this issue been resolved in your fork of Hive 3.1.3? > Thank you for sharing the issue with CachedStore and the JIRA tickets. > I will also try out metastore.stats.fetch.bitvector=true. > > Regards, > - Takanobu > > 2024年2月28日(水) 18:49 Sungwoo Park <glap...@gmail.com>: > >> Hello Takanobu, >> >> We did not test with vanilla Hive 3.1.3 and Metastore databases can be >> different, so I don't know why Metastore responses are very slow. I can >> only share some results of testing CachedStore in Metastore. Please note >> that we did not use vanilla Hive 3.1.3 and instead used our own fork of >> Hive 3.1.3 (which applies many additional patches). >> >> 1. >> When CachedStore is enabled, column stats are not computed. As a result, >> some queries generate very inefficient plans because of wrong/inaccurate >> stats. >> >> Perhaps this is because not all patches for CachedStore have been merged >> to Hive 3.1.3. For example, these patches are not merged. Or, there might >> be some way to properly configure CachedStore so that it correctly computes >> column stats. >> >> HIVE-20896: CachedStore fail to cache stats in multiple code paths >> HIVE-21063: Support statistics in cachedStore for transactional table >> HIVE-24258: Data mismatch between CachedStore and ObjectStore for >> constraint >> >> So, we decided that CachedStore should not be enabled in Hive 3.1.3. >> >> (If anyone is running Hive Metastore 3.1.3 in production with CachedStore >> enabled, please let us know how you configure it.) >> >> 2. >> Setting metastore.stats.fetch.bitvector=true can also help generate more >> efficient query plans. >> >> --- Sungwoo >> >> >> On Wed, Feb 28, 2024 at 1:40 PM Takanobu Asanuma <tasan...@apache.org> >> wrote: >> >>> Hi Sungwoo Park, >>> >>> I'm sorry for the late reply to this old email. >>> We are attempting to upgrade Hive MetaStore from Hive1 to Hive3, and >>> noticed that the response of the Hive3 MetaStore is very slow. >>> We suspect that HIVE-14187 might be causing this slowness. >>> Could you tell me if you have resolved this problem? Are there still any >>> problems when you enable CachedStore? >>> >>> Regards, >>> - Takanobu >>> >>> 2018年6月13日(水) 0:37 Sungwoo Park <glap...@gmail.com>: >>> >>>> Hello Hive users, >>>> >>>> I am experience a problem with MetaStore in Hive 3.0. >>>> >>>> 1. Start MetaStore >>>> with >>>> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore. >>>> >>>> 2. Generate TPC-DS data. >>>> >>>> 3. TPC-DS queries run okay and produce correct results. E.g., from >>>> query 1: >>>> +-------------------+ >>>> | c_customer_id | >>>> +-------------------+ >>>> | AAAAAAAAAAAACHAA | >>>> | AAAAAAAAAAAADCAA | >>>> | AAAAAAAAAAAADDAA | >>>> ... >>>> | AAAAAAAAAAAILIAA | >>>> +-------------------+ >>>> 100 rows selected (69.901 seconds) >>>> >>>> However, the query compilation takes long ( >>>> https://issues.apache.org/jira/browse/HIVE-16520). >>>> >>>> 4. Now, restart MetaStore with >>>> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.cache.CachedStore. >>>> >>>> 5. TPC-DS queries run okay, but produce wrong results. E.g, from query >>>> 1: >>>> +----------------+ >>>> | c_customer_id | >>>> +----------------+ >>>> +----------------+ >>>> No rows selected (37.448 seconds) >>>> >>>> What I noticed is that with hive.metastore.rawstore.impl=CachedStore, >>>> HiveServer2 produces such log messages: >>>> >>>> 2018-06-12T23:50:04,223 WARN [b3041385-0290-492f-aef8-c0249de328ad >>>> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for >>>> tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year >>>> 2018-06-12T23:50:04,223 INFO [b3041385-0290-492f-aef8-c0249de328ad >>>> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for >>>> tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year >>>> 2018-06-12T23:50:04,225 WARN [b3041385-0290-492f-aef8-c0249de328ad >>>> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for >>>> tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk >>>> 2018-06-12T23:50:04,225 INFO [b3041385-0290-492f-aef8-c0249de328ad >>>> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for >>>> tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk >>>> 2018-06-12T23:50:04,226 WARN [b3041385-0290-492f-aef8-c0249de328ad >>>> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for >>>> tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk, >>>> c_customer_id >>>> 2018-06-12T23:50:04,226 INFO [b3041385-0290-492f-aef8-c0249de328ad >>>> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for >>>> tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk, >>>> c_customer_id >>>> >>>> 2018-06-12T23:50:05,158 ERROR [b3041385-0290-492f-aef8-c0249de328ad >>>> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory: >>>> Invalid column stats: No of nulls > cardinality >>>> 2018-06-12T23:50:05,159 ERROR [b3041385-0290-492f-aef8-c0249de328ad >>>> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory: >>>> Invalid column stats: No of nulls > cardinality >>>> 2018-06-12T23:50:05,160 ERROR [b3041385-0290-492f-aef8-c0249de328ad >>>> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory: >>>> Invalid column stats: No of nulls > cardinality >>>> >>>> However, even after computing column stats, queries still return wrong >>>> results, despite the fact that the above log messages disappear. >>>> >>>> I guess I am missing some configuration parameters (because I imported >>>> hive-site.xml from Hive 2). Any suggestion would be appreciated. >>>> >>>> Thanks a lot, >>>> >>>> --- Sungwoo Park >>>> >>>> -- ---------------------------------- Pau Tallada Crespí Departament de Serveis Port d'Informació Científica (PIC) Tel: +34 93 170 2729 ----------------------------------