[ https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697664#comment-14697664 ]
Sergey Shelukhin commented on HIVE-11500: ----------------------------------------- Actually the main reason all these calls exist for partitions is because they use args instead of request-response pattern, which makes it impossible to change the signature in a backward-compatible manner. I will happily refactor these calls to be generic, or deprecate them in favor of generic calls and remove later, if the need arises. > implement file footer / splits cache in HBase metastore > ------------------------------------------------------- > > Key: HIVE-11500 > URL: https://issues.apache.org/jira/browse/HIVE-11500 > Project: Hive > Issue Type: Sub-task > Components: Metastore > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > Attachments: HBase metastore split cache.pdf > > > We need to cache file metadata (e.g. ORC file footers) for split generation > (which, on FSes that support fileId, will be valid permanently and only needs > to be removed lazily when ORC file is erased or compacted), and potentially > even some information about splits (e.g. grouping based on location that > would be good for some short time), in HBase metastore. > -It should be queryable by table. Partition predicate pushdown should be > supported. If bucket pruning is added, that too.- Given that we cannot cache > file lists (we have to check FS for new/changed files anyway), and the > difficulty of passing of data about partitions/etc. to split generation > compared to paths, we will probably just filter by paths and fileIds. It > might be different for splits > In later phases, it would be nice to save the (first category above) results > of expensive work done by jobs, e.g. data size after decompression/decoding > per column, etc. to avoid surprises when ORC encoding is very good, or very > bad. Perhaps it can even be lazily generated. Here's a pony: 🐴 -- This message was sent by Atlassian JIRA (v6.3.4#6332)