[ https://issues.apache.org/jira/browse/HIVE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13763864#comment-13763864 ]
Sergey Shelukhin commented on HIVE-5189: ---------------------------------------- it seems like there are two ways to make this work uniformly for all flavors of getPartitions[WithFoo][ByBar], of which there are several (by filter, by names, by regex, with auth, all partitions, etc.) First, for each filtering call except by names add a call the will return the filtered names rather than partitions; then, for each return flavor add getPartitionsByNamesWithFoo; the new client will get names and then control the batching. This has advantage of less API breakage, but disadvantage of doing 2 calls in most cases where only one is necessary (typically small number of partitions is retrieved and OOM is not a problem). Second, add APIs _with_batching that would take max count (some APIs already do), as well as last retrieved partition name, which can then be added as an additional condition (partName > lastName) to all JDO and SQL queries used for retrieval. The client would send that on subsequent calls. This has disadvantage of requiring slightly more new APIs, and backward compat code in client. Old APIs will be deprecated, and removed 1-2 versions later. New APIs can use request-response structs as parameter and return, which will allow adding args/etc. in future without breaking backward compat. > make batching in partition retrieval in metastore applicable to more methods > ---------------------------------------------------------------------------- > > Key: HIVE-5189 > URL: https://issues.apache.org/jira/browse/HIVE-5189 > Project: Hive > Issue Type: Improvement > Components: Metastore > Reporter: Sergey Shelukhin > > As indicated in HIVE-5158, Metastore can OOM if retrieving a large number of > partitions. For client-side partition filtering, the client applies batching > (that would avoid that) by sending parts of the filtered name list in > separate request according to configuration. > The batching is not used on filter pushdown path, and when retrieving all > partitions (e.g. when the pruner expression is not useful in non-strict > mode). HIVE-4914 and pushdown improvements will make this problem somewhat > worse by allowing more requests to go to the server. > There needs to be some batching scheme (ideally, a somewhat generic one) that > would be applicable to all these paths. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira