Re: Hash Joins vs. Bloom Filters / take 2

Peter Geoghegan Tue, 20 Feb 2018 15:25:46 -0800

On Tue, Feb 20, 2018 at 3:17 PM, Claudio Freire <klaussfre...@gmail.com> wrote:
> I've worked a lot with bloom filters, and for large false positive
> rates and large sets (multi-million entries), you get bloom filter
> sizes of about 10 bits per distinct item.


It's generally true that you need 9.6 bits per element to get a 1%
false positive rate. 1% could be considered much too low here.

Do we need to eliminate 99% of all hash join probes (that find nothing
to join on) to make this Bloom filter optimization worthwhile?
Personally, I doubt it.

> That's efficient, but it's not magic. It can still happen that the
> whole set can't fit in work_mem with an acceptable false positive
> rate.

A merge join is always going to be the better choice when extremely
memory constrained.

-- 
Peter Geoghegan

Re: Hash Joins vs. Bloom Filters / take 2

Reply via email to