On Tue, Feb 20, 2018 at 3:48 PM, Claudio Freire <klaussfre...@gmail.com> wrote: >> Do we need to eliminate 99% of all hash join probes (that find nothing >> to join on) to make this Bloom filter optimization worthwhile? >> Personally, I doubt it. > > Even for 90% it's about 4.6 bits per element.
4.6 bits is vastly less than typical hash join hash table entry sizes, which are often comprised of several attributes. Even the skinniest possible hash table elements have HJTUPLE_OVERHEAD() overhead, as well as MinimalTuple overhead. To say nothing of all the serialization and synchronization overhead that you could also have. Whatever precise per-element overhead you come up with for your Bloom filter, the *ratio* of those two things is what makes using a Bloom filter potentially very effective. It could end up being something like 1:100, which is a huge win for a highly selective hash join that still has a fairly large hash table (that cannot fit in L3 cache/work_mem). > What I'm saying is that you can't assume you can fit the whole thing > in work_mem. If it could be said that any set worth a hash join will > fit, you can build a work_mem-sized bloom filter and just accept that > if you exceed its capacity its filtering efficiency will drop > benignly. But IMO that can't be taken for granted, so at some point > you should just give up, the overhead of querying a low-quality BF > won't be worth it. It's probably true that we will need to bail out of using a Bloom filter, and it is certainly true that the optimization won't always work out. Still, once your Bloom filter proves useless, chances are good that the plan isn't a very good one, with or without that extra optimization. Even if the plan is truly the fastest possible plan, and even if you avoid wasting extra effort on the Bloom filter, it will still be very slow. Join selectivity is what really matters with this optimization. Of course the optimization might not work out, but you can say that about almost anything that uses hashing -- that's why safety critical realtime systems (e.g. avionics systems) often completely avoid using hash tables. Bloom filters are fairly resilient to somewhat inaccurate estimates, but nothing will save you from terrible estimates. I think that it's worth risking a small, predictable regression when the join turns out to not be selective, in order to get a potentially very large improvement in performance when the planner estimates join selectivity reasonably accurately (and it's a selective hash join that won't simply have a small hash table to begin with). -- Peter Geoghegan