Hi, On 09/19/2017 02:55 AM, Robert Haas wrote: > On Mon, Sep 18, 2017 at 5:13 PM, Peter Geoghegan <p...@bowt.ie> wrote: >> On Mon, Sep 18, 2017 at 2:07 PM, Robert Haas <robertmh...@gmail.com> wrote: >>> On Mon, Sep 18, 2017 at 1:29 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: >>>> Uh, why does the planner need to be involved at all? >>> >>> Because it loses if the Bloom filter fails to filter anything. That's >>> not at all far-fetched; consider SELECT * FROM a.x, b.x WHERE a.x = >>> b.x given a foreign key on a.x referencing b(x). >> >> Wouldn't a merge join be a lot more likely in this case anyway? Low >> selectivity hash joins with multiple batches are inherently slow; the >> wasted overhead of using a bloom filter may not matter. >> >> Obviously this is all pretty speculative. I suspect that this could be >> true, and it seems worth investigating that framing of the problem >> first. > > ISTR Tomas Vondra doing some experiments with this a few years ago and > finding that it was, in fact, a problem. >
You seem to have better memory than me, but you're right - I did some experiments with this in 2015, the WIP patch and discussion is here: https://www.postgresql.org/message-id/5670946e.8070...@2ndquadrant.com The whole idea was that with a bloom filter we can reduce the amount of tuples (from the outer relation) written to batches. The patch is fairly simple, and did not try to push the bloom filters to scan nodes or anything like that. It might be a meaningful first step, though, particularly for selective joins (where only small number of rows from the outer relation has a match in the hash table). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers