[ 
https://issues.apache.org/jira/browse/HIVE-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544130#comment-14544130
 ] 

Ashutosh Chauhan commented on HIVE-2327:
----------------------------------------

This patch ports regex udf to GenericUDF framework, which in itself is a good 
enough reason to commit this, since we want all our udfs to move to that 
eventually. But I am skeptical that whether it will actually regress in terms 
of performance. Reason being said optimization of not repeatedly compiling 
regex is already there in previous version. New version introduces additional 
branches, virtual function calls and actual logic computation in evaluate()
Can you do before and after perf comparison for this change ?

> Optimize REGEX UDFs with constant parameter information
> -------------------------------------------------------
>
>                 Key: HIVE-2327
>                 URL: https://issues.apache.org/jira/browse/HIVE-2327
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>            Reporter: Adam Kramer
>            Assignee: Alexander Pivovarov
>         Attachments: HIVE-2327.01.patch, HIVE-2327.2.patch
>
>
> There are a lot of UDFs which would show major performance differences if one 
> assumes that some of its arguments are constant.
> Consider, for example, any UDF that takes a regular expression as input: This 
> can be complied once (fast) if it's a constant, or once per row (wicked slow) 
> if it's not a constant.
> Or, consider any UDF that reads from a file and/or takes a filename as input; 
> it would have to re-read the whole file if the filename changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to