[ 
https://issues.apache.org/jira/browse/HIVE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13763864#comment-13763864
 ] 

Sergey Shelukhin commented on HIVE-5189:
----------------------------------------

it seems like there are two ways to make this work uniformly for all flavors of 
getPartitions[WithFoo][ByBar], of which there are several (by filter, by names, 
by regex, with auth, all partitions, etc.)
First, for each filtering call except by names add a call the will return the 
filtered names rather than partitions; then, for each return flavor add 
getPartitionsByNamesWithFoo; the new client will get names and then control the 
batching. This has advantage of less API breakage, but disadvantage of doing 2 
calls in most cases where only one is necessary (typically small number of 
partitions is retrieved and OOM is not a problem).
Second, add APIs _with_batching that would take max count (some APIs already 
do), as well as last retrieved partition name, which can then be added as an 
additional condition (partName > lastName) to all JDO and SQL queries used for 
retrieval. The client would send that on subsequent calls. This has 
disadvantage of requiring slightly more new APIs, and backward compat code in 
client. Old APIs will be deprecated, and removed 1-2 versions later. New APIs 
can use request-response structs as parameter and return, which will allow 
adding args/etc. in future without breaking backward compat.
                
> make batching in partition retrieval in metastore applicable to more methods
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-5189
>                 URL: https://issues.apache.org/jira/browse/HIVE-5189
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Sergey Shelukhin
>
> As indicated in HIVE-5158, Metastore can OOM if retrieving a large number of 
> partitions. For client-side partition filtering, the client applies batching 
> (that would avoid that) by sending parts of the filtered name list in 
> separate request according to configuration.
> The batching is not used on filter pushdown path, and when retrieving all 
> partitions (e.g. when the pruner expression is not useful in non-strict 
> mode). HIVE-4914 and pushdown improvements will make this problem somewhat 
> worse by allowing more requests to go to the server.
> There needs to be some batching scheme (ideally, a somewhat generic one) that 
> would be applicable to all these paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to