[jira] [Commented] (HIVE-2327) Optimize REGEX UDFs with constant parameter information

Alexander Pivovarov (JIRA) Thu, 14 May 2015 15:20:00 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544440#comment-14544440
 ]


Alexander Pivovarov commented on HIVE-2327:
-------------------------------------------

Is there a standard approach to do UDF performance test in hive?
Is there any tables which are usually used for UDF performance tests in hive?

If not then I can create 1 Mil rows table with string column t1 containing 
random strings 100 chars each.
Then I run smth like:
{code}
select count(*) from t where t.t1 regexp '.*abc.*';
{code}

> Optimize REGEX UDFs with constant parameter information
> -------------------------------------------------------
>
>                 Key: HIVE-2327
>                 URL: https://issues.apache.org/jira/browse/HIVE-2327
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>            Reporter: Adam Kramer
>            Assignee: Alexander Pivovarov
>         Attachments: HIVE-2327.01.patch, HIVE-2327.2.patch
>
>
> There are a lot of UDFs which would show major performance differences if one 
> assumes that some of its arguments are constant.
> Consider, for example, any UDF that takes a regular expression as input: This 
> can be complied once (fast) if it's a constant, or once per row (wicked slow) 
> if it's not a constant.
> Or, consider any UDF that reads from a file and/or takes a filename as input; 
> it would have to re-read the whole file if the filename changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-2327) Optimize REGEX UDFs with constant parameter information

Reply via email to