[
https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920797#action_12920797
]
Paul Yang commented on HIVE-1660:
---------------------------------
HIVE-1660.1.patch is the main patch - it create a listPartitionNamesByFilter()
method and fixes get_partitions_ps() and get_partition_names_ps() to use the
new filter API's. In addition, the patch makes an optimization to use a
partition name regex for filtering in cases of equality comparisons.
HIVE-1660_regex.patch was a little experiment to test out the potential speed
up from filtering based on a more complete regex of the partition name. For
example, for a table partitioned on ds and hr, this patch uses a regex like
'ds=2010-10-01/hr=.*' to find all partitions with a ds='2010-10-01'. For a
table with ~5 million partitions and ~15K partitions a day, getting the
partitions for a single day took ~1s with this regex patch vs ~10s for the
filter patch. Since the table with 5 million partitions was a very unusual
case, I didn't think the speedup was worth the additional complexity.
> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
> Key: HIVE-1660
> URL: https://issues.apache.org/jira/browse/HIVE-1660
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Ajay Kidave
> Assignee: Paul Yang
> Attachments: HIVE-1660.1.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the
> database is added by HIVE-1609. Changing get_partitions_ps to use this could
> result in performance improvement for tables having large number of
> partitions. A listPartitionNamesByFilter API might be required for
> implementing this for use from Hive.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.