[ 
https://issues.apache.org/jira/browse/HIVE-11294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-11294:
------------------------------
    Attachment: HIVE-11294.patch

This patch adds caching of the aggregates stats to HBase.  It also 
fundamentally changes how cached entries are matched.  Now only exact matches 
are taken, rather than partial matches as was done in the past.  The key for 
entries in the cache is an md5 sum of the dbname, tablename, and sorted list of 
partition names.  This allows for reasonable key sizes and fast lookup.

A limited number of entries are still kept in memory (10K by default) for a 
limited time (1 min by default).  This is to reduce back and forth to HBase.

Entries in HBase are kept in the cache for 1 week or until a partition's stats 
are updated or the partition is dropped.  Determining when an aggregate needs 
to be dropped is not straight forward.  Since the key is an md5 sum we cannot 
determine from the key if an entry contains the partition that was updated or 
dropped.  To deal with this each entry also contains a bloom filter of all the 
partition names.  When a partition is updated or dropped it is added a queue.  
Every 5 seconds a separate thread takes all of the entries from the queue and 
does a full scan of the cache.  It uses the bloom filters to determine if any 
of the entries in the queue match one of the partitions in the aggregate.  If 
so, it drops the aggregate entry.  Given that this is done by a bloom filter 
there will be some false positives (entries that get dropped that shouldn't) 
but the error rate was chosen to be very low (0.1%).  This makes the bloom 
filter larger but the motivation in choosing the bloom filter was to minimize 
processing time rather than to save space.

All of this means there will be lag between when a partition is dropped or 
updated and when the aggregate is dropped.  It will be < 5 seconds if the drop 
was done on the same HS2 instance, or <65 seconds if done on another instance.  
Given that these are statistics I think that's acceptable.

Ideally we would not drop an aggregate as soon as a single partition is dropped 
or updated.  Instead we should be tracking the number of invalidated partitions 
and only drop the aggregate once it reaches a threshold like 5%.  Doing this 
would require implementing the invalidation logic as a co-processor rather than 
as a filter, which is why I didn't do it this way to begin with.

> Use HBase to cache aggregated stats
> -----------------------------------
>
>                 Key: HIVE-11294
>                 URL: https://issues.apache.org/jira/browse/HIVE-11294
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: hbase-metastore-branch
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: HIVE-11294.patch
>
>
> Currently stats are cached only in the memory of the client.  Given that 
> HBase can easily manage the scale of caching aggregated stats we should be 
> using it to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to