On Tue, Feb 20, 2018 at 3:17 PM, Claudio Freire <klaussfre...@gmail.com> wrote: > I've worked a lot with bloom filters, and for large false positive > rates and large sets (multi-million entries), you get bloom filter > sizes of about 10 bits per distinct item.
It's generally true that you need 9.6 bits per element to get a 1% false positive rate. 1% could be considered much too low here. Do we need to eliminate 99% of all hash join probes (that find nothing to join on) to make this Bloom filter optimization worthwhile? Personally, I doubt it. > That's efficient, but it's not magic. It can still happen that the > whole set can't fit in work_mem with an acceptable false positive > rate. A merge join is always going to be the better choice when extremely memory constrained. -- Peter Geoghegan