[ https://issues.apache.org/jira/browse/HIVE-11294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Gates updated HIVE-11294: ------------------------------ Attachment: HIVE-11294.patch This patch adds caching of the aggregates stats to HBase. It also fundamentally changes how cached entries are matched. Now only exact matches are taken, rather than partial matches as was done in the past. The key for entries in the cache is an md5 sum of the dbname, tablename, and sorted list of partition names. This allows for reasonable key sizes and fast lookup. A limited number of entries are still kept in memory (10K by default) for a limited time (1 min by default). This is to reduce back and forth to HBase. Entries in HBase are kept in the cache for 1 week or until a partition's stats are updated or the partition is dropped. Determining when an aggregate needs to be dropped is not straight forward. Since the key is an md5 sum we cannot determine from the key if an entry contains the partition that was updated or dropped. To deal with this each entry also contains a bloom filter of all the partition names. When a partition is updated or dropped it is added a queue. Every 5 seconds a separate thread takes all of the entries from the queue and does a full scan of the cache. It uses the bloom filters to determine if any of the entries in the queue match one of the partitions in the aggregate. If so, it drops the aggregate entry. Given that this is done by a bloom filter there will be some false positives (entries that get dropped that shouldn't) but the error rate was chosen to be very low (0.1%). This makes the bloom filter larger but the motivation in choosing the bloom filter was to minimize processing time rather than to save space. All of this means there will be lag between when a partition is dropped or updated and when the aggregate is dropped. It will be < 5 seconds if the drop was done on the same HS2 instance, or <65 seconds if done on another instance. Given that these are statistics I think that's acceptable. Ideally we would not drop an aggregate as soon as a single partition is dropped or updated. Instead we should be tracking the number of invalidated partitions and only drop the aggregate once it reaches a threshold like 5%. Doing this would require implementing the invalidation logic as a co-processor rather than as a filter, which is why I didn't do it this way to begin with. > Use HBase to cache aggregated stats > ----------------------------------- > > Key: HIVE-11294 > URL: https://issues.apache.org/jira/browse/HIVE-11294 > Project: Hive > Issue Type: Improvement > Components: Metastore > Affects Versions: hbase-metastore-branch > Reporter: Alan Gates > Assignee: Alan Gates > Attachments: HIVE-11294.patch > > > Currently stats are cached only in the memory of the client. Given that > HBase can easily manage the scale of caching aggregated stats we should be > using it to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)