[ https://issues.apache.org/jira/browse/HIVE-15741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842376#comment-15842376 ]
Gopal V commented on HIVE-15741: -------------------------------- [~teddy.choi]: the StringExpr::equal() was written to be much faster for the common case where the strings aren't equal. The most common case for long strings are URLs are which are usually suffix different and differ in length (len != len & last byte is different). This patch prevents that from taking a fast-path (added to support common-crawl and similar clickstream data-streams). The Unsafe code is only faster if the strings are equal or differ in the prefix. > Faster unsafe byte array comparisons > ------------------------------------ > > Key: HIVE-15741 > URL: https://issues.apache.org/jira/browse/HIVE-15741 > Project: Hive > Issue Type: Improvement > Reporter: Teddy Choi > Assignee: Teddy Choi > Priority: Minor > Attachments: HIVE-15741.1.patch > > > Byte array comparison is heavily used in joins and string conditions. Pure > Java implementation is simple but not performant. An implementation with > Unsafe#getLong is much faster. It's already implemented in > org.apache.hadoop.io.WritableComparator#compare. The WritableComparator class > handles exceptional cases, including a different endian and no access to > Unsafe, and it was used for many years in production. > This patch will replace pure Java byte array comparisons with safe and faster > unsafe ones to get more performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)