[
https://issues.apache.org/jira/browse/HIVE-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152609#comment-13152609
]
[email protected] commented on HIVE-2555:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2849/
-----------------------------------------------------------
(Updated 2011-11-18 03:03:52.834508)
Review request for Yongqiang He, Ning Zhang and namit jain.
Summary (updated)
-------
Made HashTable in groupby plugable, a class that will supply hashtable
functionality has to implement ExternalMap interface.
Currently I supplied 2 fully (hopefully) working pluggable classes:
ExternalJavaHashMap and ExternalHPPCObjectObjectOpenHashMap (ExternalJavaMap is
an abstract class that will make adding Hashmaps that implement java.util.Map
interface easier). ExternalMap has some strange methods, to allow doing various
tricks that can increase efficiency. ExternalMap could be easily made more
general yet I decided it's not worth doing that at this point (it could be if
ExternalMap was also to be used by other things than GroupByOperator).
Additionally a Trie implementation was added, yet it does not currently support
whole functionality (currently only supports String, int, long and bool
columns).
I strongly dislike removing 10% of the hashmap in GroupByOperator.flush() since
no known to me HashMap implementation supplies efficient and nice way to do it,
maybe there is a way to do something about that flushing. Additionally, that
method to remove 10% hasn't been tested yet if it works with the new
implementations properly.
At this point the new libraries jars (all have Apache Commons 2.0 license) are
added in a way to "just work", if there is a more proper way of adding jars,
then I am not aware how to do it.
Because now all keys are passed in KeyWrappers, there is a large overhead due
to that. And which "hashmap" implementation behaves best, may actually change a
lot, if at one point a primitive java types were used (e.g. on group by on
multiple columns one could have hashmaps in hashamps for each column but last,
that will have the actual value, and that might be the fastest way to do it -
after all group by's aren't on really huge column numbers commonly).
A lot of the implementation I did up to now was done quite crudely - meaning
some operations could be further optimized even without changing current Hive
code. Yet this patch at this point should sufficiently show proof of concept
and that it is worth, I believe, to continue work here.
Currently the regular Java HashMap implementation is set in conf to be used.
Possibly the new HashMaps implementations should be moved to another package.
This addresses bug HIVE-2555.
https://issues.apache.org/jira/browse/HIVE-2555
Diffs
-----
trunk/build-common.xml 1202523
trunk/conf/hive-default.xml 1202523
trunk/lib/hppc-0.4.1.jar UNKNOWN
trunk/lib/patricia-trie-0.6.jar UNKNOWN
trunk/ql/build.xml 1202523
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1202523
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java
1202523
trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ExternalHPPCObjectObjectOpenHashMap.java
PRE-CREATION
trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ExternalJavaHashMap.java
PRE-CREATION
trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ExternalJavaMap.java
PRE-CREATION
trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ExternalMap.java
PRE-CREATION
trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ExternalPatriciaTrie.java
PRE-CREATION
trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ListKeyWrapperAnalyzer.java
PRE-CREATION
trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ObjectObjectExpandedOpenHashMap.java
PRE-CREATION
trunk/ql/src/java/org/apache/hadoop/hive/ql/util/PrivateInstantiator.java
PRE-CREATION
trunk/ql/src/java/org/apache/hadoop/hive/ql/util/TextKeyWrapperAnalyzer.java
PRE-CREATION
trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ListObjectsEqualComparer.java
1202523
Diff: https://reviews.apache.org/r/2849/diff
Testing (updated)
-------
Worked on some sample queries with each implementation added.
Thanks,
Robert
> Make the hashmap in map-side group by pluggable
> -----------------------------------------------
>
> Key: HIVE-2555
> URL: https://issues.apache.org/jira/browse/HIVE-2555
> Project: Hive
> Issue Type: New Feature
> Reporter: Namit Jain
> Assignee: Robert Surówka
> Attachments: HIVE-2555.2.patch, HIVE-2555.3.patch, HIVE-2555.4.patch
>
>
> There are a couple of implementations available (other than
> java.util.HashMap) - COLT, TROVE etc. to name a few.
> If the hashmap was pluggable, it would be easy to play around with different
> hash maps and tune performance.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira