[ https://issues.apache.org/jira/browse/HIVE-16592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gopal V updated HIVE-16592: --------------------------- Description: {code} public static int calculateLongHashCode(long key) { key = (~key) + (key << 21); // key = (key << 21) - key - 1; key = key ^ (key >>> 24); key = (key + (key << 3)) + (key << 8); // key * 265 key = key ^ (key >>> 14); key = (key + (key << 2)) + (key << 4); // key * 21 key = key ^ (key >>> 28); key = key + (key << 31); return (int) key; } {code} Does not mix enough bits into the lower 32 bits, which are used for the bucket probes. The 1997 document lists {code} public int hash6432shift(long key) { key = (~key) + (key << 18); // key = (key << 18) - key - 1; key = key ^ (key >>> 31); key = key * 21; // key = (key + (key << 2)) + (key << 4); key = key ^ (key >>> 11); key = key + (key << 6); key = key ^ (key >>> 22); return (int) key; } {code} as the algorithm for keeping the lower 32 bits well distributed. > Vectorization: Long hashes use hash64shift and not hash6432shift to generate > int hashCodes > ------------------------------------------------------------------------------------------ > > Key: HIVE-16592 > URL: https://issues.apache.org/jira/browse/HIVE-16592 > Project: Hive > Issue Type: Bug > Reporter: Gopal V > Priority: Minor > > {code} > public static int calculateLongHashCode(long key) { > key = (~key) + (key << 21); // key = (key << 21) - key - 1; > key = key ^ (key >>> 24); > key = (key + (key << 3)) + (key << 8); // key * 265 > key = key ^ (key >>> 14); > key = (key + (key << 2)) + (key << 4); // key * 21 > key = key ^ (key >>> 28); > key = key + (key << 31); > return (int) key; > } > {code} > Does not mix enough bits into the lower 32 bits, which are used for the > bucket probes. > The 1997 document lists > {code} > public int hash6432shift(long key) > { > key = (~key) + (key << 18); // key = (key << 18) - key - 1; > key = key ^ (key >>> 31); > key = key * 21; // key = (key + (key << 2)) + (key << 4); > key = key ^ (key >>> 11); > key = key + (key << 6); > key = key ^ (key >>> 22); > return (int) key; > } > {code} > as the algorithm for keeping the lower 32 bits well distributed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)