[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead

Sergey Shelukhin (JIRA) Thu, 20 Feb 2014 16:12:13 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907758#comment-13907758
 ]


Sergey Shelukhin commented on HIVE-6430:
----------------------------------------

Here's the summary of the overhead per entry /after both of the above patches 
go in/ (before, the overhead in key and value is significantly bigger).

HashTable
Entry array: 8+ bytes
Entry: 32 bytes
Key and value objects: 32 bytes


Key
Byte array object + length: 20 bytes.
Field count and null mask: 1 byte.
Rounding to 8 bytes: 0-7 bytes.

Row
Fields: 8 bytes.
Object array object + length: 24 bytes.
Per-column, writable object: 16 bytes (assuming all the other fields in 
writables are useful data).

"Guaranteed" overhead per entry: 125 bytes, plus writables for row values and 
padding on key.
Example double key, row with one field: additional 21 bytes per entry, ~146 
total
Example int key, row with 5 fields: additional 87 bytes per entry, ~212 total
+ some overhead depending on HashMap fullness.

So that's a lot of overhead (depends on the data of course, if row contains cat 
photos in binary then 150-200 bytes is not much).


The approach to get rid of per-entry overhead in general involves a hashtable 
implemented on top of array, with open addressing, and storing the actual 
variable-length keys and rows in big flat array(s) of byte[]-s or objects. That 
would get rid of key and rowe object overhead, most of hashmap overhead, most 
of key overhead, and most/some (see below) of row overhead.

The good thing about the table is that it's R/O after initial creation and we 
never delete, so we don't have to worry about many scenarios.

*Details (scroll down for estimates)*
Simple case, assuming we can convert both key and row into bytes:
Allocate largish fixed size byte arrays to have an infinite write buffer (or 
array can be reallocated if needed, or combination). Have a flat, custom-made 
hash table similar to HPPC one, that would store offsets into that array in the 
key array (of longs), and would have no value or state arrays. Some additional 
stuff, for example lengths or null bitmasks can be fit into key array values 
also.
When loading, incoming writables would write the keys and values into the write 
buffer.  We know the schema so we don't have to worry about storing types, 
field offsets etc. Then write a fixed-size tail with e.g. length of key and 
value, to know what to compare and where value starts, etc. Because there's no 
requirement to allocate some number of bytes like there is now, v-length format 
can be used if needed to save space... but it shouldn't be too complicated. 
Probably it shouldn't use ORC there :) Then, key array uses standard hashtable 
put to store the offset to the postfix.
When getting, the key can still be compared same as now, as a byte array. One 
extra "dereference" from key array to get to the actual key by index.
For values, writables will have to be re-created when the row is requested 
because everything depends on writables now. Writables will trivially read from 
byte array at offset. Obviously this has performance cost.
Note that this is not like current lazy deserialization:
1) We do not deserialize on demand - final writables are just written to/read 
from byte array, so  creating them should be cheaper than deserializing.
2) Writables are not preserved for future use and are created every time row is 
accessed, which has perf cost but saves memory.
Total overhead per entry would be around 14-16 bytes, plus some fixed or 
semi-fixed overhead depending on the write buffer allocation scheme.
In the above examples overhead will go from 146 and 212 bytes to 16 and 16.

Another alternative is similar, but with only keys in byte array, and values in 
a separate large Object array operating on the same principles, in writables 
with all their glory.
Key array can store indices and length to both, probably 2-3 longs per entry 
depending on what limitations we can accept.
So the total overhead will be around 16-24 bytes + 16 per field in the row, but 
writables wouldn't need to be re-created.
In the above examples overhead will go from 146 and 212 bytes to 32 and 96.

*Tl;dr and estimates*
The bad thing obviously is that w/o key and row objects all the interfaces 
around them would cease to exist. This is esp. bad for MR due to convoluted 
HashTable path with write and read, so in the first cut I think we should go 
Tez-only and preserve legacy path with objects for MR.

There are several good things... 
* We can essentially copy-paste HPPC long-long hashmap. It probably doesn't fit 
by itself and we don't need all the features, but it must be simple to convert 
to above. So we don't need to code up the open-addressing hashmap.
* W.r.t. interface difference, I looked at the divergent paths; Tez HT loader 
obviously would be able to do whatever. MapJoinOperator is the only place where 
there will be problems - it currently creates the key and then calls get(key). 
Get can be changed to take the row, so that it would create the key for get as 
necessary.
* Code for byte key creation, compare, validation etc.; and some other code 
from the above two patches can be reused; plus I know all I need to know and 
what needs to be done about writables and bytes from them.








> MapJoin hash table has large memory overhead
> --------------------------------------------
>
>                 Key: HIVE-6430
>                 URL: https://issues.apache.org/jira/browse/HIVE-6430
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>
> Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 
> for row) can take several hundred bytes, which is ridiculous. I am reducing 
> the size of MJKey and MJRowContainer in other jiras, but in general we don't 
> need to have java hash table there.  We can either use primitive-friendly 
> hashtable like the one from HPPC (Apache-licenced), or some variation, to map 
> primitive keys to single row storage structure without an object per row 
> (similar to vectorization).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead

Reply via email to