Hi, For 0007-Remove-the-special-batch-mode-use-a-larger--20210203.patch : + /* same as preceding value, so store it */ + if (compare_values(&range->values[start + i - 1], + &range->values[start + i], + (void *) &cxt) == 0) + continue; + + range->values[start + n] = range->values[start + i];
It seems the comment doesn't match the code: the value is stored when subsequent value is different from the previous. For has_matching_range(): + int midpoint = (start + end) / 2; I think the standard notion for midpoint is start + (end-start)/2. + /* this means we ran out of ranges in the last step */ + if (start > end) + return false; It seems the above should be ahead of computation of midpoint. Similar comment for the code in AssertCheckRanges(). Cheers On Wed, Feb 3, 2021 at 3:55 PM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > Hi, > > Here's an updated and significantly improved version of the patch > series, particularly the multi-minmax part. I've fixed a number of > stupid bugs in that, discovered by either valgrind or stress-tests. > > I was surprised by some of the bugs, or rather that the existing > regression tests failed to crash on them, so it's probably worth briefly > discussing the details. There were two main classes of such bugs: > > > 1) missing datumCopy > > AFAICS this happened because there were a couple missing datumCopy > calls, and BRIN covers multiple pages, so with by-ref data types we > added a pointer but the buffer might have gone away unexpectedly. > Regular regression tests passed just fine, because brin_multi runs > almost separately, so the chance of the buffer being evicted was low. > Valgrind reported this (with a rather enigmatic message, as usual), and > so did a simple stress-test creating many indexes concurrently. Anything > causing aggressive eviction of buffer would do the trick, I think, > triggering segfaults, asserts, etc. > > > 2) bogus opclass definitions > > There were a couple opclasses referencing incorrect distance function, > intended for a different data type. I was scratching my head WTH the > regression tests pass, as there is a table to build multi-minmax index > on all supported data types. The reason is pretty silly - the table is > very small, just 100 rows, with very low fillfactor (so only couple > values per page), and the index was created with pages_per_range=1. So > the compaction was not triggered and the distance function was never > actually called. I've decided to build the indexes on a larger data set > first, to test this. But maybe this needs somewhat different approach. > > > BLOOM > ----- > > The attached patch series addresses comments from the last review. As > for the size limit, I've defined a new macro > > #define BloomMaxFilterSize \ > MAXALIGN_DOWN(BLCKSZ - \ > (MAXALIGN(SizeOfPageHeaderData + \ > sizeof(ItemIdData)) + \ > MAXALIGN(sizeof(BrinSpecialSpace)) + \ > SizeOfBrinTuple)) > > and use that to determine if the bloom filter is too large. IMO that's > close enough, considering that this is a best-effort check anyway (due > to not being able to consider multi-column indexes). > > > MINMAX-MULTI > ------------ > > As mentioned, there's a lot of fixes and improvements in this part, but > the basic principle is still the same. I've kept it split into three > parts with different approaches to building, so that it's possible to do > benchmarks and comparisons, and pick the best one. > > a) 0005 - Aggressively compacts the summary, by always keeping it within > the limit defined by values_per_range. So if the range contains more > values, this may trigger compaction very often in some cases (e.g. for > monotonic series). > > One drawback is that the more often the compactions happen, the less > optimal the result is - the algorithm is kinda greedy, picking something > like local optimums in each step. > > b) 0006 - Batch build, exactly the opposite of 0005. Accumulates all > values in a buffer, then does a single round of compaction at the very > end. This obviously doesn't have the "greediness" issues, but it may > consume quite a bit of memory for some data types and/or indexes with > large BRIN ranges. > > c) 0007 - A hybrid approach, using a buffer that is multiple of the > user-specified value, with some safety min/max limits. IMO this is what > we should use, although perhaps with some tuning of the exact limits. > > > Attached is a spreadsheet with benchmark results for each of those three > approaches, on different data types (byval/byref), data set types, index > parameters (pages/values per range) etc. I think 0007 is a reasonable > compromise overall, with performance somewhere in betwen 0005 and 0006. > Of course, there are cases where it's somewhat slow, e.g. for data types > with expensive comparisons and data sets forcing frequent compactions, > in which case it's ~10x slower compared to regular minmax (in most cases > it's ~1.5x). Compared to btree, it's usually much faster - ~2-3x as fast > (except for some extreme cases, of course). > > > As for the opclasses for indexes without "natural" distance function, > implemented in 0008, I think we should drop it. In theory it works, but > I'm far from convinced it's actually useful in practice. Essentially, if > you have a data type with ordering but without a meaningful concept of a > distance, it's hard to say what is an outlier. I'd bet most of those > data types are used as "labels" where even the ordering is kinda > useless, i.e. hardly anyone uses range queries on things like names, > it's all just equality searches. Which means the bloom indexes are a > much better match for this use case. > > > The other thing we were considering was using the new multi-minmax > opclasses as default ones, replacing the existing minmax ones. IMHO we > shouldn't do that either. For existing minmax indexes that's useless > (the opclass seems to be working, otherwise the index would be dropped). > But even for new indexes I'm not sure it's the right thing, so I don't > plan to change this. > > > I'm also attaching the stress-test that I used to test the hell out of > this, creating indexes on various data sets, data types, with varying > index parameters, etc. > > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >