On Tue, Aug 29, 2017 at 7:22 PM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > Indeed. Thank you for working on this! To summarise a couple of > ideas that Peter and I discussed off-list a while back: (1) While > building the hash table for a hash join we could build a Bloom filter > per future batch and keep it in memory, and then while reading from > the outer plan we could skip writing tuples out to future batches if > there is no chance they'll find a match when read back in later (works > only for inner joins and only pays off in inverse proportion to the > join's selectivity);
Right. Hash joins do tend to be very selective, though, so I think that this could help rather a lot. With just 8 or 10 bits per element, you can eliminate almost all the batch write-outs on the outer side. No per-worker synchronization for BufFiles is needed when this happens, either. It seems like that could be very important. > To use this for anything involving parallelism where a Bloom filter > must be shared we'd probably finish up having to create a shared > version of bloom_init() that either uses caller-provided memory and > avoids the internal pointer, or allocates DSA memory. I suppose you > could consider splitting your bloom_init() function up into > bloom_estimate() and bloom_init(user_supplied_space, ...) now, and > getting rid of the separate pointer to bitset (ie stick char > bitset[FLEXIBLE_ARRAY_MEMBER] at the end of the struct)? Makes sense. Not hard to add that. > I was just observing that there is an opportunity for code reuse here. > This function's definition would ideally be something like: > > double > bloom_prop_bits_set(bloom_filter *filter) > { > return popcount(filter->bitset, NBYTES(filter)) / (double) NBITS(filter); > } > > This isn't an objection to the way you have it, since we don't have a > popcount function yet! We have several routines in the tree for > counting bits, though not yet the complete set from Hacker's Delight. Right. I'm also reminded of the lookup tables for the visibility/freeze map. > Your patch brings us one step closer to that goal. (The book says > that this approach is good far sparse bitsets, but your comment says > that we expect something near 50%. That's irrelevant anyway since a > future centralised popcount() implementation would do this in > word-sized chunks with a hardware instruction or branch-free-per-word > lookups in a table and not care at all about sparseness.) I own a copy of Hacker's Delight (well, uh, Daniel Farina lent me his copy about 2 years ago!). pop()/popcount() does seem like a clever algorithm, that we should probably think about adopting in some cases, but I should point at that the current caller to my bloom_prop_bits_set() function is an elog() DEBUG1 call. This is not at all performance critical. > + * Test if bloom filter definitely lacks element. > > I think where "Bloom filter" appears in prose it should have a capital > letter (person's name). Agreed. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers