[ https://issues.apache.org/jira/browse/HIVE-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543059#comment-14543059 ]
Alexander Pivovarov commented on HIVE-2327: ------------------------------------------- I run vectorization_short_regress.q test locally and noticed couple changes in q.out file - rlike is replaced with regexp (they are synonyms) - parentesises around "(a regexp b)" are removed in explain query output. I also noticed that old UDFRegExp registration set isOperator=true {code} system.registerUDF("rlike", UDFRegExp.class, true); system.registerUDF("regexp", UDFRegExp.class, true); {code} But new implementation extends GenericUDF. Generic UDF registration does not have isOperator parameter. Can it cause any issues? {code} system.registerGenericUDF("rlike", GenericUDFRegExp.class); system.registerGenericUDF("regexp", GenericUDFRegExp.class); {code} > Optimize REGEX UDFs with constant parameter information > ------------------------------------------------------- > > Key: HIVE-2327 > URL: https://issues.apache.org/jira/browse/HIVE-2327 > Project: Hive > Issue Type: Improvement > Components: UDF > Reporter: Adam Kramer > Assignee: Alexander Pivovarov > Attachments: HIVE-2327.01.patch > > > There are a lot of UDFs which would show major performance differences if one > assumes that some of its arguments are constant. > Consider, for example, any UDF that takes a regular expression as input: This > can be complied once (fast) if it's a constant, or once per row (wicked slow) > if it's not a constant. > Or, consider any UDF that reads from a file and/or takes a filename as input; > it would have to re-read the whole file if the filename changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)