[ 
https://issues.apache.org/jira/browse/HBASE-28963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Mattingly resolved HBASE-28963.
-----------------------------------
    Release Note: The horizontal scalability of the Quotas refresh chore was 
improved. A side effect of this change is that each Quotas cache miss will not 
result in an immediate refreshing of the cache.
      Resolution: Fixed

> Updating Quota Factors is too expensive
> ---------------------------------------
>
>                 Key: HBASE-28963
>                 URL: https://issues.apache.org/jira/browse/HBASE-28963
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.6.1
>            Reporter: Ray Mattingly
>            Assignee: Ray Mattingly
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.6.2
>
>         Attachments: image-2024-11-06-12-06-44-317.png, 
> quota-refresh-hmaster.png
>
>
> My company is running Quotas across a few hundred clusters of varied size. 
> One cluster has hundreds of servers and tens of thousands of regions. We 
> noticed that the HMaster was quite busy for this cluster, and after some 
> investigation we realized that RegionServers were hammering the HMaster's 
> ClusterMetrics endpoint to facilitate the refreshing of table machine quota 
> factors.
> There are a few things that we could do here — in a perfect world, I think 
> the RegionServers would have a better P2P communication of region states, and 
> whatever else is, necessary to derive new quota factors. Relying solely on 
> the HMaster for this coordination creates a tricky bottleneck for the 
> horizontal scalability of clusters.
> That said, I think that a simpler and preferable initial step would be to 
> make our code a bit more cost conscious. At my company, for example, we don't 
> even define any table-scoped quotas. Without any table scoped quotas in the 
> cache, our cache could be much more thoughtful about the work that it chooses 
> to do on each refresh. So I'm proposing that we check [the size of the 
> tableQuotaCache 
> keyset|https://github.com/apache/hbase/blob/db3ba44a4c692d26e70b6030fc519e92fd79f638/hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/QuotaCache.java#L418]
>  earlier, and use this inference to determine what ClusterMetrics we bother 
> to fetch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to