[jira] [Updated] (HIVE-28094) Improve HMS client cache and query cache performance for getTableInternal

Soumyakanti Das (Jira) Mon, 26 Feb 2024 20:21:10 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-28094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Soumyakanti Das updated HIVE-28094:
-----------------------------------
    Description: 
Currently we cache calls to {{getTableInternal}} method in HMS client cache and 
query cache. We also cache table ids in the query cache, but not in the HMS 
client cache.

 

To cache {{{}getTableInternal{}}}, we create a CacheKey containing the 
{{GetTableRequest}} object. However, we do not check if all the necessary 
fields are set in the key. This results in a lot of cache misses, especially 
because we rely on {{validWriteIdList}} not being null and {{tableId}} not 
being -1. {{GetTableRequest}} object also contains `catName` which is not 
always set. All these things result in creating duplicate keys and not using 
the caches efficiently.

 

Moreover, {{getTableInternal}} is called from other APIs that are getting 
cached, e.g. {{{}getPartitionsByExprInternal{}}}, so improvements in its 
performance will positively affect other APIs too.

 

*RESULTS:*

I ran all TPCDS explain cbo queries on my local machine, after cherry-picking 
[HIVE-28083: Enable HMS client cache and HMS query cache for Explain 
plans|https://github.com/apache/hive/pull/5092/commits/41a766d6a51480edb505fd53661a03c63ef3937a].
 Then I analyzed the logs with a simple python script to get min, 25th 
percentile, median, 75th percentile, and max for PERFLOG logs with this pattern:
{code:java}
</PERFLOG method=(\w+) start=\d+ end=\d+ duration=(\d+) from=.* HS2-cache>'
{code}
Here are the results.

*WITHOUT the improvements to {{getTableInternal}} method:*
|*API*|*MIN*|*25th*|*MEDIAN*|*75th*|*MAX*|
|*getTable*|2|3|3|4|233|
|*getTableConstraints*|2|4|4|5|22|
|*getPartitionsByExpr*|19|22|25|27|2396|
|*getAggrColStatsFor*|0|125.5|186|284|910|
|*getTableColumnStatistics*|4|6|7|8|454|

Cache Stats:
{code:java}
CacheStats{hitCount=77464, missCount=11919, loadSuccessCount=0, 
loadFailureCount=0, totalLoadTime=0, evictionCount=0, evictionWeight=0} {code}
*WITH the improvements to {{getTableInternal}} method:*
|*API*|*MIN*|*25th*|*MEDIAN*|*75th*|*MAX*|
|*getTable*|0|0|0|0|33|
|*getTableConstraints*|3|4|4|5|20|
|*getPartitionsByExpr*|14|16|19|21|2247|
|*getAggrColStatsFor*|0|124.5|187|272.5|936|
|*getTableColumnStatistics*|0|0|0|1|16|

Cache Stats:
{code:java}
CacheStats{hitCount=81044, missCount=11943, loadSuccessCount=0, 
loadFailureCount=0, totalLoadTime=0, evictionCount=0, evictionWeight=0} {code}
We can see that latency for the APIs, and the cache {{hitCount}} improves with 
this patch.

  was:
Currently we cache calls to {{getTableInternal}} method in HMS client cache and 
query cache. We also cache table ids in the query cache, but not in the HMS 
client cache.

 

To cache {{{}getTableInternal{}}}, we create a CacheKey containing the 
{{GetTableRequest}} object. However, we do not check if all the necessary 
fields are set in the key. This results in a lot of cache misses, especially 
because we rely on {{validWriteIdList}} not being null and {{tableId}} not 
being -1. {{GetTableRequest}} object also contains `catName` which is not 
always set. All these things result in creating duplicate keys and not using 
the caches efficiently.

 

Moreover, {{getTableInternal}} is called from other APIs that are getting 
cached, e.g. {{{}getPartitionsByExprInternal{}}}, so improvements in its 
performance will positively affect other APIs too.

 

RESULTS:

I ran all TPCDS explain cbo queries on my local machine, after cherry-picking 
[HIVE-28083: Enable HMS client cache and HMS query cache for Explain 
plans|https://github.com/apache/hive/pull/5092/commits/41a766d6a51480edb505fd53661a03c63ef3937a].
 Then I analyzed the logs with a simple python script to get min, 25th 
percentile, median, 75th percentile, and max for PERFLOG logs with this pattern:
{code:java}
</PERFLOG method=(\w+) start=\d+ end=\d+ duration=(\d+) from=.* HS2-cache>'
{code}
Here are the results.

Without 


> Improve HMS client cache and query cache performance for getTableInternal
> -------------------------------------------------------------------------
>
>                 Key: HIVE-28094
>                 URL: https://issues.apache.org/jira/browse/HIVE-28094
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>    Affects Versions: 4.0.0-beta-1
>            Reporter: Soumyakanti Das
>            Assignee: Soumyakanti Das
>            Priority: Major
>
> Currently we cache calls to {{getTableInternal}} method in HMS client cache 
> and query cache. We also cache table ids in the query cache, but not in the 
> HMS client cache.
>  
> To cache {{{}getTableInternal{}}}, we create a CacheKey containing the 
> {{GetTableRequest}} object. However, we do not check if all the necessary 
> fields are set in the key. This results in a lot of cache misses, especially 
> because we rely on {{validWriteIdList}} not being null and {{tableId}} not 
> being -1. {{GetTableRequest}} object also contains `catName` which is not 
> always set. All these things result in creating duplicate keys and not using 
> the caches efficiently.
>  
> Moreover, {{getTableInternal}} is called from other APIs that are getting 
> cached, e.g. {{{}getPartitionsByExprInternal{}}}, so improvements in its 
> performance will positively affect other APIs too.
>  
> *RESULTS:*
> I ran all TPCDS explain cbo queries on my local machine, after cherry-picking 
> [HIVE-28083: Enable HMS client cache and HMS query cache for Explain 
> plans|https://github.com/apache/hive/pull/5092/commits/41a766d6a51480edb505fd53661a03c63ef3937a].
>  Then I analyzed the logs with a simple python script to get min, 25th 
> percentile, median, 75th percentile, and max for PERFLOG logs with this 
> pattern:
> {code:java}
> </PERFLOG method=(\w+) start=\d+ end=\d+ duration=(\d+) from=.* HS2-cache>'
> {code}
> Here are the results.
> *WITHOUT the improvements to {{getTableInternal}} method:*
> |*API*|*MIN*|*25th*|*MEDIAN*|*75th*|*MAX*|
> |*getTable*|2|3|3|4|233|
> |*getTableConstraints*|2|4|4|5|22|
> |*getPartitionsByExpr*|19|22|25|27|2396|
> |*getAggrColStatsFor*|0|125.5|186|284|910|
> |*getTableColumnStatistics*|4|6|7|8|454|
> Cache Stats:
> {code:java}
> CacheStats{hitCount=77464, missCount=11919, loadSuccessCount=0, 
> loadFailureCount=0, totalLoadTime=0, evictionCount=0, evictionWeight=0} {code}
> *WITH the improvements to {{getTableInternal}} method:*
> |*API*|*MIN*|*25th*|*MEDIAN*|*75th*|*MAX*|
> |*getTable*|0|0|0|0|33|
> |*getTableConstraints*|3|4|4|5|20|
> |*getPartitionsByExpr*|14|16|19|21|2247|
> |*getAggrColStatsFor*|0|124.5|187|272.5|936|
> |*getTableColumnStatistics*|0|0|0|1|16|
> Cache Stats:
> {code:java}
> CacheStats{hitCount=81044, missCount=11943, loadSuccessCount=0, 
> loadFailureCount=0, totalLoadTime=0, evictionCount=0, evictionWeight=0} {code}
> We can see that latency for the APIs, and the cache {{hitCount}} improves 
> with this patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28094) Improve HMS client cache and query cache performance for getTableInternal

Reply via email to