[ https://issues.apache.org/jira/browse/HIVE-7617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gopal V updated HIVE-7617: -------------------------- Attachment: hashmap-wb-fixes.png I find that this increased memory usage for small JOINs with this on my VM & I can't find any perf difference from the shift-size fixes. Once the JIT kicks in, both pre-patch and post-patch have inline constant replacements for the "ldiv". {code} 0x00007fe284b89782: dec ebp 0x00007fe284b89783: mov edx, ebp 0x00007fe284b89785: dec ecx 0x00007fe284b89786: and edx, 0x0000000000008000 0x00007fe284b8978c: dec ecx 0x00007fe284b8978d: mov ecx, ebp 0x00007fe284b8978f: dec eax 0x00007fe284b89790: shr ecx, 0x0000000000000018 {code} The rest is less clear for me, the new class for IntGetAdaptor has turned off the inlining for the other GetAdaptor so this is only faster if I have only int keys in all my JOINs. If you mix an INT key and a STRING key in the same vertex (not even the same JOIN cond), then the JIT seems to get a bit confused and turns off all mono-morphic optimizations that the previous impl had. This still triggers slow code in copyToStandardObject() before entering the fast-path. The first change in perf happens after about ~9k rows, the sampling profiler seems to turn off a bunch of these optimizations as I'm able to confirm with my linux perf counters instead. !hashmap-wb-fixes.png! I can say one thing for sure, this would've probably helped us if we wrote C++, where the runtime recompilation with constants do not happen. I'm not sure whether this patch is useful as long as we use the JVM. > optimize bytes mapjoin hash table read path wrt serialization, at least for > common cases > ---------------------------------------------------------------------------------------- > > Key: HIVE-7617 > URL: https://issues.apache.org/jira/browse/HIVE-7617 > Project: Hive > Issue Type: Improvement > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > Attachments: HIVE-7617.01.patch, HIVE-7617.02.patch, HIVE-7617.patch, > HIVE-7617.prelim.patch, hashmap-wb-fixes.png > > > BytesBytes has table stores keys in the byte array for compact > representation, however that means that the straightforward implementation of > lookups serializes lookup keys to byte arrays, which is relatively expensive. > We can either shortcut hashcode and compare for common types on read path > (integral types which would cover most of the real-world keys), or specialize > hashtable and from BytesBytes... create LongBytes, StringBytes, or whatever. > First one seems simpler now. -- This message was sent by Atlassian JIRA (v6.2#6252)