On Fri, Sep 18, 2020 at 6:27 PM Tomas Vondra <tomas.von...@2ndquadrant.com> wrote:
> But maybe we could still use this scheme by actually computing > > h1 = hash_uint32_extended(value, seed1); > h2 = hash_uint32_extended(value, seed2); > > and then use this as the independent hash functions. I think that would > meet the requirements of the paper. Yeah, that would work algorithmically. It would be trivial to add to the patch, too of course. There'd be a higher up-front cpu cost. Also, I'm a bit cautious of rehashing hashes, and whether the two values above are independent enough. I'm not sure either of these points matters. My guess is the partition approach is more sound, but it has some minor organizational challenges (see below). > Why would 11k bits be more than we'd like to store? Assuming we could > use the whole 8kB page for the bloom filter, that'd be about 64k bits. > In practice there'd be a bit of overhead (page header ...) but it's > still much more than 11k bits. Brain fade -- I guess I thought we were avoiding being toasted, but now I see that's not possible for BRIN storage. So, we'll want to guard against this: ERROR: index row size 8160 exceeds maximum 8152 for index "bar_num_idx" While playing around with the numbers I had an epiphany: At the defaults, the filter already takes up ~4.3kB, over half the page. There is no room for another tuple, so if we're only indexing one column, we might as well take up the whole page. Here MT = max tuples per 128 8k pages, or 37120, so default ndistinct is 3712. n k m p MT/10 7 35580 0.01 MT/10 7 64000 0.0005 MT/10 12 64000 0.00025 Assuming ndistinct isn't way off reality, we get 20x-40x lower false positive rate almost for free, and it'd be trivial to code! Keeping k at 7 would be more robust, since it's equivalent to starting with n = ~6000, p = 0.006, which is still almost 2x less false positives than you asked for. It also means nearly doubling the number of sorted values before switching. Going the other direction, capping nbits to 64k bits when ndistinct gets too high, the false positive rate we can actually support starts to drop. Here, the user requested 0.001 fpr. n k p 4500 9 0.001 6000 7 0.006 7500 6 0.017 15000 3 0.129 (probably useless by now) MT 1 0.440 64000 1 0.63 (possible with > 128 pages per range) I imagine smaller pages_per_range settings are going to be useful for skinny tables (note to self -- test). Maybe we could provide a way for the user to see that their combination of pages_per_range, false positive rate, and ndistinct is supportable, like brin_bloom_get_supported_fpr(). Or document to check with page_inspect. And that's not even considering multi-column indexes, like you mentioned. > But I guess we can simply make the table > of primes a bit larger, right? If we want to support all the above cases without falling down entirely, it would have to go up to 32k to be safe (When k = 1 we could degenerate to one modulus on the whole filter). That would be a table of about 7kB, which we could binary search. [thinks for a moment]...Actually, if we wanted to be clever, maybe we could precalculate the primes needed for the 64k bit cases and stick them at the end of the array. The usual algorithm will find them. That way, we could keep the array around 2kB. However, for >8kB block size, we couldn't adjust the 64k number, which might be okay, but not really project policy. We could also generate the primes via a sieve instead, which is really fast (and more code). That would be more robust, but that would require the filter to store the actual primes used, so 20 more bytes at max k = 10. We could hard-code space for that, or to keep from hard-coding maximum k and thus lowest possible false positive rate, we'd need more bookkeeping. So, with the partition approach, we'd likely have to set in stone either max nbits, or min target false positive rate. The latter option seems more principled, not only for the block size, but also since the target fp rate is already fixed by the reloption, and as I've demonstrated, we can often go above and beyond the reloption even without changing k. > >One wrinkle is that the sum of k primes is not going to match m > >exactly. If the sum is too small, we can trim a few bits off of the > >filter bitmap. If the sum is too large, the last partition can spill > >into the front of the first one. This shouldn't matter much in the > >common case since we need to round m to the nearest byte anyway. > > > > AFAIK the paper simply says that as long as the sum of partitions is > close to the requested nbits, it's good enough. So I guess we could just > roll with that, no need to trim/wrap or something like that. Hmm, I'm not sure I understand you. I can see not caring to trim wasted bits, but we can't set/read off the end of the filter. If we don't wrap, we could just skip reading/writing that bit. So a tiny portion of access would be at k - 1. The paper is not clear on what to do here, but they are explicit in minimizing the absolute value, which could go on either side. Also I found a bug: + add_local_real_reloption(relopts, "false_positive_rate", + "desired false-positive rate for the bloom filters", + BLOOM_DEFAULT_FALSE_POSITIVE_RATE, + 0.001, 1.0, offsetof(BloomOptions, falsePositiveRate)); When I set fp to 1.0, the reloption code is okay with that, but then later the assertion gets triggered. -- John Naylor https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services