Sergey Shelukhin created HIVE-11500: ---------------------------------------
Summary: implement file footer / splits cache in HBase metastore Key: HIVE-11500 URL: https://issues.apache.org/jira/browse/HIVE-11500 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin We need to cache footer data for split generation (which, on FSes that support fileId, will be valid permanently and only needs to be removed lazily when ORC file is erased or compacted), and potentially even some information about splits (e.g. grouping based on location that would be good for some short time), in HBase metastore. It should be queryable by table. Partition predicate pushdown should be supported. If bucket pruning is added, that too. In later phases, it would be nice to save the (first category above) results of expensive work done by jobs, e.g. data size after decompression/decoding per column, etc. to avoid surprises when ORC encoding is very good, or very bad. Perhaps it can even be lazily generated. Here's a pony: 🐴 -- This message was sent by Atlassian JIRA (v6.3.4#6332)