[jira] [Resolved] (HADOOP-11829) Improve the vector size of Bloom Filter from int to long, and storage from memory to disk

Hongbo Xu (JIRA) Tue, 21 Apr 2015 18:31:13 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-11829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hongbo Xu resolved HADOOP-11829.
--------------------------------
    Resolution: Invalid

> Improve the vector size of Bloom Filter from int to long, and storage from 
> memory to disk
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-11829
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11829
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>            Reporter: Hongbo Xu
>            Assignee: Hongbo Xu
>            Priority: Minor
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> org.apache.hadoop.util.bloom.BloomFilter(int vectorSize, int nbHash, int 
> hashType) 
> This filter almost can insert 900 million objects, when False Positives 
> Probability is 0.0001, and it needs 2.1G RAM.
> In My project, I needs established a filter which capacity is 2 billion, and 
> it needs 4.7G RAM, the vector size is 38340233509, out the range of int, and 
> I does not have so much RAM to do this, so I rebuild a big bloom filter which 
> vector size type is long, and split the bit data to some files on disk, then 
> distribute files to work node, and the performance is very good.
> I think I can contribute this code to Hadoop Common, and a 128-bit Hash 
> function (MurmurHash)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HADOOP-11829) Improve the vector size of Bloom Filter from int to long, and storage from memory to disk

Reply via email to