Sergey Shelukhin created HIVE-11500:
---------------------------------------

             Summary: implement file footer / splits cache in HBase metastore
                 Key: HIVE-11500
                 URL: https://issues.apache.org/jira/browse/HIVE-11500
             Project: Hive
          Issue Type: Sub-task
            Reporter: Sergey Shelukhin
            Assignee: Sergey Shelukhin


We need to cache footer data for split generation (which, on FSes that support 
fileId, will be valid permanently and only needs to be removed lazily when ORC 
file is erased or compacted), and potentially even some information about 
splits (e.g. grouping based on location that would be good for some short 
time), in HBase metastore.
It should be queryable by table. Partition predicate pushdown should be 
supported. If bucket pruning is added, that too. 

In later phases, it would be nice to save the (first category above) results of 
expensive work done by jobs, e.g. data size after decompression/decoding per 
column, etc. to avoid surprises when ORC encoding is very good, or very bad. 
Perhaps it can even be lazily generated. Here's a pony: 🐴



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to