[ https://issues.apache.org/jira/browse/HIVE-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543068#comment-14543068 ]
Alexander Pivovarov commented on HIVE-2327: ------------------------------------------- I found the reason for diff in q.out files and the place where isOperator is used. 1. GenericUDFBridge.isOperator is only used in getDisplayString method This method adds parenthesis around "(a regexp b)" {code} if (isOperator) { ... return "(" + children[0] + " " + udfName + " " + children[1] + ")"; } {code} I do not think we need parenthesis in getDisplayString output. This is why new GenericUDFRegExp.getDisplayString() returns just {code} @Override public String getDisplayString(String[] children) { return children[0] + " regexp " + children[1]; } {code} 2. The reason why rlike is replaced with regexp in query plan is because GenericUDFRegExp.getFuncName returns "regexp" (because it's primary name for the function) {code} @Override protected String getFuncName() { return "regexp"; } {code} I'll update q.out files soon > Optimize REGEX UDFs with constant parameter information > ------------------------------------------------------- > > Key: HIVE-2327 > URL: https://issues.apache.org/jira/browse/HIVE-2327 > Project: Hive > Issue Type: Improvement > Components: UDF > Reporter: Adam Kramer > Assignee: Alexander Pivovarov > Attachments: HIVE-2327.01.patch > > > There are a lot of UDFs which would show major performance differences if one > assumes that some of its arguments are constant. > Consider, for example, any UDF that takes a regular expression as input: This > can be complied once (fast) if it's a constant, or once per row (wicked slow) > if it's not a constant. > Or, consider any UDF that reads from a file and/or takes a filename as input; > it would have to re-read the whole file if the filename changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)