[
https://issues.apache.org/jira/browse/HIVE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965859#action_12965859
]
Alex Boisvert commented on HIVE-1700:
-------------------------------------
Duplicate of HIVE-1702
> Optimiza JDBM to make mapjoin faster
> ------------------------------------
>
> Key: HIVE-1700
> URL: https://issues.apache.org/jira/browse/HIVE-1700
> Project: Hive
> Issue Type: Improvement
> Reporter: He Yongqiang
>
> copied from email:
> From: Joydeep Sen Sarma
> Sent: Tuesday, October 12, 2010 11:11 AM
> To: Yongqiang He; Liyin Tang; Namit Jain
> Subject: RE: Optimize jdbm
> seems like we should move all deserialization to hive land. jdbm should just
> work on byte arrays for both keys and values. (since the output of the
> serializer used by hive is byte comparable - that seems to suffice)
> ________________________________________
> From: Yongqiang He
> Sent: Tuesday, October 12, 2010 10:22 AM
> To: Liyin Tang; Namit Jain
> Cc: Joydeep Sen Sarma
> Subject: Optimize jdbm
> 1. Htree.get() cost 70% total time. It could help a lot if there is bloom
> filter here to avoid unneeded get() if we know for sure the given key is not
> in JDBM. (we can generate the bloom filter when doing the jdbm sink, and read
> into memory when doing read. )
> 2. HTree.get() will deserialize both key and value until find a matched
> key. We can only de-serialize the key, and de-serialize the value until the
> key match.
> Any others?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.