On Sun, Sep 13, 2020 at 12:40 PM Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > <20200913 patch set>
Hi Tomas, The cfbot fails to apply this, but with 0001 from 0912 it works on my end, so going with that. One problem I have is I don't get success with the new reloptions: create index cd_multi on iot using brin(create_dt timestamptz_minmax_multi_ops) with (values_per_range = 64); ERROR: unrecognized parameter "values_per_range" create index on iot using brin(create_dt timestamptz_bloom_ops) with (n_distinct_per_range = 16); ERROR: unrecognized parameter "n_distinct_per_range" Aside from that, I'm going to try to understand the code, and ask questions. Some of the code might still change, but I don't think it's too early to do some comment and docs proofreading. I'll do this in separate emails for bloom and multi-minmax to keep it from being too long. Perf testing will come sometime later. Bloom ----- + greater than 0.0 and smaller than 1.0. The default values is 0.01, + rows per block). The default values is <literal>-0.1</literal>, and s/values/value/ + the minimum number of distinct non-null values is <literal>128</literal>. I don't see 128 in the code, but I do see this, is this the intention?: #define BLOOM_MIN_NDISTINCT_PER_RANGE 16 + * Bloom filters allow efficient test whether a given page range contains + * a particular value. Therefore, if we summarize each page range into a + * bloom filter, we can easily and cheaply test wheter it containst values + * we get later. s/test/testing/ s/wheter it containst/whether it contains/ + * The index only supports equality operator, similarly to hash indexes. s/operator/operators/ + * The number of distinct elements (in a page range) depends on the data, + * we can consider it fixed. This simplifies the trade-off to just false + * positive rate vs. size. Sounds like the first sentence should start with "although". + * of distinct items to be stored in the filter. We can't quite the input + * data, of course, but we may make the BRIN page ranges smaller - instead I think you accidentally a word. + * Of course, good sizing decisions depend on having the necessary data, + * i.e. number of distinct values in a page range (of a given size) and + * table size (to estimate cost change due to change in false positive + * rate due to having larger index vs. scanning larger indexes). We may + * not have that data - for example when building an index on empty table + * it's not really possible. And for some data we only have estimates for + * the whole table and we can only estimate per-range values (ndistinct). and + * The current logic, implemented in brin_bloom_get_ndistinct, attempts to + * make some basic sizing decisions, based on the table ndistinct estimate. + * XXX We might also fetch info about ndistinct estimate for the column, + * and compute the expected number of distinct values in a range. But Maybe I'm missing something, but the first two comments don't match the last one -- I don't see where we get table ndistinct, which I take to mean from the stats catalog? + * To address these issues, the opclass stores the raw values directly, and + * only switches to the actual bloom filter after reaching the same space + * requirements. IIUC, it's after reaching a certain size (BLOOM_MAX_UNSORTED * 4), so "same" doesn't make sense here. + /* + * The 1% value is mostly arbitrary, it just looks nice. + */ +#define BLOOM_DEFAULT_FALSE_POSITIVE_RATE 0.01 /* 1% fp rate */ I think we'd want better stated justification for this default, even if just precedence in other implementations. Maybe I can test some other values here? + * XXX Perhaps we could save a few bytes by using different data types, but + * considering the size of the bitmap, the difference is negligible. Yeah, I think it's obvious enough to leave out. + m = ceil((ndistinct * log(false_positive_rate)) / log(1.0 / (pow(2.0, log(2.0))))); I find this pretty hard to read and pgindent is going to mess it up further. I would have a comment with the formula in math notation (note that you can dispense with the reciprocal and just use negation), but in code fold the last part to a constant. That might go against project style, though: m = ceil(ndistinct * log(false_positive_rate) * -2.08136); + * XXX Maybe the 64B min size is not really needed? Something to figure out before commit? + /* assume 'not updated' by default */ + Assert(filter); I don't see how these are related. + big_h = ((uint32) DatumGetInt64(hash_uint32(value))); I'm curious about the Int64 part -- most callers use the bare value or with DatumGetUInt32(). Also, is there a reference for the algorithm for hash values that follows? I didn't see anything like it in my cursory reading of the topic. Might be good to include it in the comments. + * Tweak the ndistinct value based on the pagesPerRange value. First, Nitpick: "Tweak" to my mind means to adjust an existing value. The above is only true if ndistinct is negative, but we're really not tweaking, but using it as a scale factor. Otherwise it's not adjusted, only clamped. + * XXX We can only do this when the pagesPerRange value was supplied. + * If it wasn't, it has to be a read-only access to the index, in which + * case we don't really care. But perhaps we should fall-back to the + * default pagesPerRange value? I don't understand this. +static double +brin_bloom_get_fp_rate(BrinDesc *bdesc, BloomOptions *opts) +{ + return BloomGetFalsePositiveRate(opts); +} The body of the function is just a macro not used anywhere else -- is there a reason for having the macro? Also, what's the first parameter for? Similarly, BloomGetNDistinctPerRange could just be inlined or turned into a function for readability. + * or null if it is not exists. s/is not exists/does not exist/ + /* + * XXX not sure the detoasting is necessary (probably not, this + * can only be in an index). + */ Something to find out before commit? + /* TODO include the sorted/unsorted values */ Patch TODO or future TODO? -- John Naylor https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services