Gopal V created HIVE-14450: ------------------------------ Summary: Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum Key: HIVE-14450 URL: https://issues.apache.org/jira/browse/HIVE-14450 Project: Hive Issue Type: Improvement Reporter: Gopal V
{code} public static int truncate(byte[] bytes, int start, int length, int maxLength) { int end = start + length; // count characters forward int j = start; int charCount = 0; while(j < end) { // UTF-8 continuation bytes have 2 high bits equal to 0x80. if ((bytes[j] & 0xc0) != 0x80) { if (charCount == maxLength) { break; } ++charCount; } j++; } return (j - start); } {code} Should not dirty the L1 cache if the maxLength is 4096 and the input string has 256 bytes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)