[ https://issues.apache.org/jira/browse/HIVE-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737987#comment-13737987 ]
Teddy Choi commented on HIVE-4100: ---------------------------------- If we allow "\uffff" form, then "\UDFBa" in "hive\ql\udf\UDFBack.java" may be parsed as a unicode character. It is ambiguous. How about this way? {code}REGEXP_REPLACE(some_column, "[^\\u0000-\\uffff]", UNESCAPE_UNICODE("\ufffd")){code} In this example, UNESCAPE_UNICODE is just a random name. > Improve regex_replace UDF to allow non-ascii characters > ------------------------------------------------------- > > Key: HIVE-4100 > URL: https://issues.apache.org/jira/browse/HIVE-4100 > Project: Hive > Issue Type: Improvement > Components: UDF > Affects Versions: 0.10.0 > Reporter: Mark Grover > Assignee: Mark Grover > > There have a been a few email threads on the user mailing list regarding > regex_replace UDF not supporting non-ASCII characters. We should validate > that and improve the UDF to allow it. Translate UDF will be a good reference > since it does that by using code points instead of characters -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira