Thanks DuyHai,

I think the trouble of bloom filter on all row keys & column names is
memory usage. However, if a CF has only hundreds of columns per row,  the
number of total columns will be much fewer, so the bloom filter is possible
for this condition, right? Is there a good way to adjust bloom filter's
property between row keys and row keys+column names automatically or by
user's config?

Thanks,
Philo Yang


2014-09-15 2:45 GMT+08:00 DuyHai Doan <doanduy...@gmail.com>:

> Hello Philo
>
>  Building bloom filter for column names (what you call column key) is
> technically possible but very expensive in term of memory usage.
>
>   The approximate formula to calculate space required by bloom filter can
> be found on slide 27 here:
> http://fr.slideshare.net/quipo/modern-algorithms-and-data-structures-1-bloom-filters-merkle-trees
>
> false positive chance = 0.6185 * m/n  where m = number of bits for the
> filter and n = number of distinct keys
>
> For example, if you want to index 1 million of rows, each having 100 000
> columns on average, it will end up indexing 100 billions of keys (row keys
> & column names) with bloom filter.
>
>  By applying the above formula, m ≈ 4.8 * 10^11 bits ≈ 60Gb to allocate in
> RAM just for bloom filter on all row keys & column names ...
>
>  Regards
>
>  Duy Hai DOAN
>
> On Sun, Sep 14, 2014 at 11:22 AM, Philo Yang <ud1...@gmail.com> wrote:
>
>> Hi all,
>>
>> After reading some docs, I find that bloom filter is built on row keys,
>> not on column key. Can anyone tell me what is considered for not building
>> bloom filter on column key? Is it a good idea to offer a table property
>> option between row key and primary key for what boolm filter is built on?
>>
>> Thanks,
>> Philo Yang
>>
>>
>

Reply via email to