[jira] Commented: (HIVE-1660) Change get_partitions_ps to pass partition filter to database

Paul Yang (JIRA) Wed, 13 Oct 2010 15:56:58 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920797#action_12920797
 ]


Paul Yang commented on HIVE-1660:
---------------------------------

HIVE-1660.1.patch is the main patch - it create a listPartitionNamesByFilter() 
method and fixes get_partitions_ps() and get_partition_names_ps() to use the 
new filter API's. In addition, the patch makes an optimization to use a 
partition name regex for filtering in cases of equality comparisons.

HIVE-1660_regex.patch was a little experiment to test out the potential speed 
up from filtering based on a more complete regex of the partition name. For 
example, for a table partitioned on ds and hr, this patch uses a regex like 
'ds=2010-10-01/hr=.*' to find all partitions with a ds='2010-10-01'. For a 
table with ~5 million partitions and ~15K partitions a day, getting the 
partitions for a single day took ~1s with this regex patch vs ~10s for the 
filter patch. Since the table with 5 million partitions was a very unusual 
case, I didn't think the speedup was worth the additional complexity.

> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
>                 Key: HIVE-1660
>                 URL: https://issues.apache.org/jira/browse/HIVE-1660
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Ajay Kidave
>            Assignee: Paul Yang
>         Attachments: HIVE-1660.1.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the 
> database is added by HIVE-1609. Changing get_partitions_ps to use this could 
> result in performance improvement  for tables having large number of 
> partitions. A listPartitionNamesByFilter API might be required for 
> implementing this for use from Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1660) Change get_partitions_ps to pass partition filter to database

Reply via email to