[
https://issues.apache.org/jira/browse/HIVE-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151644#comment-13151644
]
[email protected] commented on HIVE-2555:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2849/
-----------------------------------------------------------
(Updated 2011-11-16 23:46:35.302887)
Review request for Yongqiang He, Ning Zhang and namit jain.
Changes
-------
I added additional plugabble "hashMap" implementation - Trie. Currently I only
made it support group by on single string column, if I will have time I will
add support to the rest (it's easy but time consuming, basically one needs in
keys of Listwrapper find which byte belongs to which key, and then use proper
keyanalyzer of the primitive type the given key is).
I thought that having trie as one of things to plug may be very profitable, as
trie behaves significantly differently than HashMap
(http://upload.wikimedia.org/wikipedia/commons/9/9a/BitwiseTreesScaling.png vs
http://upload.wikimedia.org/wikipedia/commons/c/c6/HashTableScaling.png - taken
from wiki about Trie). Now all those plugins for HashMap are added quite
crudely, there is some room for improvement (e.g. to get string from
TextWrappeKey, the key is first casted to Text and then toString() is called).
Therefore there may be various overheads, removing of which can change which
plugable implementation is best quite a lot.
Again - I don't know how to properly add jars to Hive, so they are added "just
to work".
Summary
-------
Made HashTable in groupby plugable, a class that will supply hashtable
functionality has to implement ExternalMap interface. Currently I supplied 2 of
them: ExternalJavaHashMap and ExternalHPPCObjectObjectOpenHashMap
(ExternalJavaMap is an abstract class that will make adding Hashmaps that
implement java.util.Map interface easier). ExternalMap has some strange
methods, to allow doing various tricks that can increase efficiency.
ExternalMap could be easily made more general yet I decided it's not worth
doing that at this point (it could be if ExternalMap was also to be used by
other things than GroupByOperator).
I strongly dislike removing 10% of the hashmap in GroupByOperator.flush() since
no known to me HashMap implementation supplies efficient and nice way to do it,
maybe there is a way to do something about that flushing.
At this point the hppc jar is added in a way to "just work", if there is a more
proper way of adding jars, then I am not aware how to do it.
This addresses bug HIVE-2555.
https://issues.apache.org/jira/browse/HIVE-2555
Diffs (updated)
-----
trunk/build-common.xml 1202523
trunk/conf/hive-default.xml 1202523
trunk/lib/hppc-0.4.1.jar UNKNOWN
trunk/lib/patricia-trie-0.6.jar UNKNOWN
trunk/ql/build.xml 1202523
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1202523
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java
1202523
trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ExternalHPPCObjectObjectOpenHashMap.java
PRE-CREATION
trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ExternalJavaHashMap.java
PRE-CREATION
trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ExternalJavaMap.java
PRE-CREATION
trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ExternalMap.java
PRE-CREATION
trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ExternalPatriciaTrie.java
PRE-CREATION
trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ObjectObjectExpandedOpenHashMap.java
PRE-CREATION
trunk/ql/src/java/org/apache/hadoop/hive/ql/util/PrivateInstantiator.java
PRE-CREATION
trunk/ql/src/java/org/apache/hadoop/hive/ql/util/TextKeyWrapperAnalyzer.java
PRE-CREATION
Diff: https://reviews.apache.org/r/2849/diff
Testing
-------
Worked on some sample queries and passed queries_properties.q
Thanks,
Robert
> Make the hashmap in map-side group by pluggable
> -----------------------------------------------
>
> Key: HIVE-2555
> URL: https://issues.apache.org/jira/browse/HIVE-2555
> Project: Hive
> Issue Type: New Feature
> Reporter: Namit Jain
> Assignee: Robert Surówka
> Attachments: HIVE-2555.2.patch, HIVE-2555.3.patch
>
>
> There are a couple of implementations available (other than
> java.util.HashMap) - COLT, TROVE etc. to name a few.
> If the hashmap was pluggable, it would be easy to play around with different
> hash maps and tune performance.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira