Hi Pau and Sungwoo, Thanks for sharing the information.
We tested a set of simple queries which just referenced the Hive table and didn't execute any Hive jobs. The result is below. No. Version rawstore.impl connectionPoolingType HIVE-14187 QueryTime -------------------------------------------------- 1 1.2.1 ObjectStore None Not Applied 11:38 2 3.1.3 ObjectStore None Applied 34:00 3 3.1.3 CachedStore None Applied 25:00 4 3.1.3 ObjectStore HikariCP Applied 21:10 5 3.1.3 CachedStore HikariCP Applied 14:30 6 3.1.3 ObjectStore None Reverted 13:00 7 3.1.3 ObjectStore HikariCP Reverted 11:23 -------------------------------------------------- Initially, we encountered an issue of Hive MetaStore slowness when we upgraded from environment No.1 to No.2. As shown in the table, environment No.2 showed the worst test results. A unique aspect of our environment is that we don't use connection pooling. After some investigation, we thought that the combination of HIVE-14187 and connectionPoolingType=None was negatively impacting performance. The fastest case in our tests was when we reverted HIVE-14187 and set connectionPoolingType=HikariCP (see No.7). Even with connectionPoolingType set to None, the environment where we reverted HIVE-14187 still performed reasonably well (see No.6). Please note our investigation is still ongoing and we haven't yet come to a conclusion. Regards, - Takanobu 2024年2月29日(木) 12:18 Sungwoo Park <glap...@gmail.com>: > We didn't make any other attempt to fix the problem and just decided not > to use CachedStore. However, I think our installation of Metastore based on > Hive 3.1.3 is running without any serious problems. > > Could you share how long it takes to compile typical queries in your > environment (with Hive 1 and with Hive 3)? > > FYI, in our environment, sometimes it takes about 10 seconds to compile a > query on TPC-DS 10TB datasets. Specifically, the average compilation time > of 103 queries is 1.7 seconds (as reported by Hive), and the longest > compilation time is 9.6 seconds (query 49). The compilation time includes > the time for accessing Metastore. > > Thanks, > > --- Sungwoo > > > On Wed, Feb 28, 2024 at 9:59 PM Takanobu Asanuma <tasan...@apache.org> > wrote: > >> Thanks for your detailed answer! >> >> In the original email, you reported "the query compilation takes long" in >> Hive 3.0, but has this issue been resolved in your fork of Hive 3.1.3? >> Thank you for sharing the issue with CachedStore and the JIRA tickets. >> I will also try out metastore.stats.fetch.bitvector=true. >> >> Regards, >> - Takanobu >> >> 2024年2月28日(水) 18:49 Sungwoo Park <glap...@gmail.com>: >> >>> Hello Takanobu, >>> >>> We did not test with vanilla Hive 3.1.3 and Metastore databases can be >>> different, so I don't know why Metastore responses are very slow. I can >>> only share some results of testing CachedStore in Metastore. Please note >>> that we did not use vanilla Hive 3.1.3 and instead used our own fork of >>> Hive 3.1.3 (which applies many additional patches). >>> >>> 1. >>> When CachedStore is enabled, column stats are not computed. As a result, >>> some queries generate very inefficient plans because of wrong/inaccurate >>> stats. >>> >>> Perhaps this is because not all patches for CachedStore have been merged >>> to Hive 3.1.3. For example, these patches are not merged. Or, there might >>> be some way to properly configure CachedStore so that it correctly computes >>> column stats. >>> >>> HIVE-20896: CachedStore fail to cache stats in multiple code paths >>> HIVE-21063: Support statistics in cachedStore for transactional table >>> HIVE-24258: Data mismatch between CachedStore and ObjectStore for >>> constraint >>> >>> So, we decided that CachedStore should not be enabled in Hive 3.1.3. >>> >>> (If anyone is running Hive Metastore 3.1.3 in production with >>> CachedStore enabled, please let us know how you configure it.) >>> >>> 2. >>> Setting metastore.stats.fetch.bitvector=true can also help generate more >>> efficient query plans. >>> >>> --- Sungwoo >>> >>> >>> On Wed, Feb 28, 2024 at 1:40 PM Takanobu Asanuma <tasan...@apache.org> >>> wrote: >>> >>>> Hi Sungwoo Park, >>>> >>>> I'm sorry for the late reply to this old email. >>>> We are attempting to upgrade Hive MetaStore from Hive1 to Hive3, and >>>> noticed that the response of the Hive3 MetaStore is very slow. >>>> We suspect that HIVE-14187 might be causing this slowness. >>>> Could you tell me if you have resolved this problem? Are there still >>>> any problems when you enable CachedStore? >>>> >>>> Regards, >>>> - Takanobu >>>> >>>> 2018年6月13日(水) 0:37 Sungwoo Park <glap...@gmail.com>: >>>> >>>>> Hello Hive users, >>>>> >>>>> I am experience a problem with MetaStore in Hive 3.0. >>>>> >>>>> 1. Start MetaStore >>>>> with >>>>> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore. >>>>> >>>>> 2. Generate TPC-DS data. >>>>> >>>>> 3. TPC-DS queries run okay and produce correct results. E.g., from >>>>> query 1: >>>>> +-------------------+ >>>>> | c_customer_id | >>>>> +-------------------+ >>>>> | AAAAAAAAAAAACHAA | >>>>> | AAAAAAAAAAAADCAA | >>>>> | AAAAAAAAAAAADDAA | >>>>> ... >>>>> | AAAAAAAAAAAILIAA | >>>>> +-------------------+ >>>>> 100 rows selected (69.901 seconds) >>>>> >>>>> However, the query compilation takes long ( >>>>> https://issues.apache.org/jira/browse/HIVE-16520). >>>>> >>>>> 4. Now, restart MetaStore with >>>>> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.cache.CachedStore. >>>>> >>>>> 5. TPC-DS queries run okay, but produce wrong results. E.g, from query >>>>> 1: >>>>> +----------------+ >>>>> | c_customer_id | >>>>> +----------------+ >>>>> +----------------+ >>>>> No rows selected (37.448 seconds) >>>>> >>>>> What I noticed is that with hive.metastore.rawstore.impl=CachedStore, >>>>> HiveServer2 produces such log messages: >>>>> >>>>> 2018-06-12T23:50:04,223 WARN [b3041385-0290-492f-aef8-c0249de328ad >>>>> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for >>>>> tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year >>>>> 2018-06-12T23:50:04,223 INFO [b3041385-0290-492f-aef8-c0249de328ad >>>>> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for >>>>> tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year >>>>> 2018-06-12T23:50:04,225 WARN [b3041385-0290-492f-aef8-c0249de328ad >>>>> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for >>>>> tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk >>>>> 2018-06-12T23:50:04,225 INFO [b3041385-0290-492f-aef8-c0249de328ad >>>>> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for >>>>> tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk >>>>> 2018-06-12T23:50:04,226 WARN [b3041385-0290-492f-aef8-c0249de328ad >>>>> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for >>>>> tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk, >>>>> c_customer_id >>>>> 2018-06-12T23:50:04,226 INFO [b3041385-0290-492f-aef8-c0249de328ad >>>>> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for >>>>> tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk, >>>>> c_customer_id >>>>> >>>>> 2018-06-12T23:50:05,158 ERROR [b3041385-0290-492f-aef8-c0249de328ad >>>>> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory: >>>>> Invalid column stats: No of nulls > cardinality >>>>> 2018-06-12T23:50:05,159 ERROR [b3041385-0290-492f-aef8-c0249de328ad >>>>> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory: >>>>> Invalid column stats: No of nulls > cardinality >>>>> 2018-06-12T23:50:05,160 ERROR [b3041385-0290-492f-aef8-c0249de328ad >>>>> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory: >>>>> Invalid column stats: No of nulls > cardinality >>>>> >>>>> However, even after computing column stats, queries still return wrong >>>>> results, despite the fact that the above log messages disappear. >>>>> >>>>> I guess I am missing some configuration parameters (because I imported >>>>> hive-site.xml from Hive 2). Any suggestion would be appreciated. >>>>> >>>>> Thanks a lot, >>>>> >>>>> --- Sungwoo Park >>>>> >>>>>