[ https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Shelukhin updated HIVE-6429: ----------------------------------- Attachment: HIVE-6429.04.patch for now address the other feedback... I will have separate patch to use BinarySortableSerDe, just need to hack around vectorized path, but I don't think it's worth it, it's convoluted and still has to keep type array and separate path for vectorization; there s also additional changes because for example hasAnyNulls would be complicated and expensive with BSSD format, so it has to be additionally retrieved at key creation time for the big table key in MJO. > MapJoinKey has large memory overhead in typical cases > ----------------------------------------------------- > > Key: HIVE-6429 > URL: https://issues.apache.org/jira/browse/HIVE-6429 > Project: Hive > Issue Type: Improvement > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, > HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.WIP.patch, HIVE-6429.patch > > > The only thing that MJK really needs it hashCode and equals (well, and > construction), so there's no need to have array of writables in there. > Assuming all the keys for a table have the same structure, for the common > case where keys are primitive types, we can store something like a byte array > combination of keys to reduce the memory usage. Will probably speed up > compares too. -- This message was sent by Atlassian JIRA (v6.1.5#6160)