[ https://issues.apache.org/jira/browse/HIVE-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544440#comment-14544440 ]
Alexander Pivovarov commented on HIVE-2327: ------------------------------------------- Is there a standard approach to do UDF performance test in hive? Is there any tables which are usually used for UDF performance tests in hive? If not then I can create 1 Mil rows table with string column t1 containing random strings 100 chars each. Then I run smth like: {code} select count(*) from t where t.t1 regexp '.*abc.*'; {code} > Optimize REGEX UDFs with constant parameter information > ------------------------------------------------------- > > Key: HIVE-2327 > URL: https://issues.apache.org/jira/browse/HIVE-2327 > Project: Hive > Issue Type: Improvement > Components: UDF > Reporter: Adam Kramer > Assignee: Alexander Pivovarov > Attachments: HIVE-2327.01.patch, HIVE-2327.2.patch > > > There are a lot of UDFs which would show major performance differences if one > assumes that some of its arguments are constant. > Consider, for example, any UDF that takes a regular expression as input: This > can be complied once (fast) if it's a constant, or once per row (wicked slow) > if it's not a constant. > Or, consider any UDF that reads from a file and/or takes a filename as input; > it would have to re-read the whole file if the filename changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)