[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

Sergey Shelukhin (JIRA) Fri, 21 Feb 2014 23:37:16 -0800

     [ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sergey Shelukhin updated HIVE-6429:
-----------------------------------

    Attachment: HIVE-6429.04.patch

for now address the other feedback... I will have separate patch to use 
BinarySortableSerDe, just need to hack around vectorized path, but I don't 
think it's worth it, it's convoluted and still has to keep type array and 
separate path for vectorization; there
s also additional changes because for example hasAnyNulls would be complicated 
and expensive with BSSD format, so it has to be additionally retrieved at key 
creation time for the big table key in MJO.

> MapJoinKey has large memory overhead in typical cases
> -----------------------------------------------------
>
>                 Key: HIVE-6429
>                 URL: https://issues.apache.org/jira/browse/HIVE-6429
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, 
> HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.WIP.patch, HIVE-6429.patch
>
>
> The only thing that MJK really needs it hashCode and equals (well, and 
> construction), so there's no need to have array of writables in there. 
> Assuming all the keys for a table have the same structure, for the common 
> case where keys are primitive types, we can store something like a byte array 
> combination of keys to reduce the memory usage. Will probably speed up 
> compares too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

Reply via email to