[ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920797#action_12920797 ]
Paul Yang commented on HIVE-1660: --------------------------------- HIVE-1660.1.patch is the main patch - it create a listPartitionNamesByFilter() method and fixes get_partitions_ps() and get_partition_names_ps() to use the new filter API's. In addition, the patch makes an optimization to use a partition name regex for filtering in cases of equality comparisons. HIVE-1660_regex.patch was a little experiment to test out the potential speed up from filtering based on a more complete regex of the partition name. For example, for a table partitioned on ds and hr, this patch uses a regex like 'ds=2010-10-01/hr=.*' to find all partitions with a ds='2010-10-01'. For a table with ~5 million partitions and ~15K partitions a day, getting the partitions for a single day took ~1s with this regex patch vs ~10s for the filter patch. Since the table with 5 million partitions was a very unusual case, I didn't think the speedup was worth the additional complexity. > Change get_partitions_ps to pass partition filter to database > ------------------------------------------------------------- > > Key: HIVE-1660 > URL: https://issues.apache.org/jira/browse/HIVE-1660 > Project: Hadoop Hive > Issue Type: Improvement > Components: Metastore > Reporter: Ajay Kidave > Assignee: Paul Yang > Attachments: HIVE-1660.1.patch, HIVE-1660_regex.patch > > > Support for doing partition pruning by passing the partition filter to the > database is added by HIVE-1609. Changing get_partitions_ps to use this could > result in performance improvement for tables having large number of > partitions. A listPartitionNamesByFilter API might be required for > implementing this for use from Hive. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.