[ https://issues.apache.org/jira/browse/HIVE-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973730#comment-13973730 ]
Szehon Ho commented on HIVE-6843: --------------------------------- Thanks for the review. As I understand, you are passing in a string literal to Text constructor, so it is not interpreting \uD801 as one char, so there is actually 5 chars there: '\', 'u', 'D', '8', '0', '1'. I tried the following test and it seemed to work: char[] chararray = new char[] {'1', '2', '3', '\uD801', '\uDC00', '4', '5', '6'}; String str = new String(chararray); Assert.assertEquals(5, GenericUDFUtils.findText(new Text(str), new Text("4"), 0)); I guess the second check was supposed to be 5, not 4. > INSTR for UTF-8 returns incorrect position > ------------------------------------------ > > Key: HIVE-6843 > URL: https://issues.apache.org/jira/browse/HIVE-6843 > Project: Hive > Issue Type: Bug > Components: UDF > Affects Versions: 0.11.0, 0.12.0 > Reporter: Clif Kranish > Assignee: Szehon Ho > Priority: Minor > Attachments: HIVE-6843.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)