On Thu, Aug 29, 2019 at 5:13 AM Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: > Your explanation helped me to understand that this approach can be > extended to > the case of insertion into posting list, that doesn't trigger posting > split, > and that nbtsplitloc indeed doesn't need to know about posting tuples > specific. > The code is much cleaner now.
Fantastic! > Some individual indexes are larger, some are smaller compared to the > expected output. I agree that v9 might be ever so slightly more space efficient than v5 was, on balance. In any case v9 completely fixes the regression that I saw in the last version. I have pushed the changes to the test output for the serial tests that I privately maintain, that I gave you access to. The MGD test output also looks perfect. We may find that deduplication is a little too effective, in the sense that it packs so many tuples on to leaf pages that *concurrent* inserters will tend to get excessive page splits. We may find that it makes sense to aim for posting lists that are maybe 96% of BTMaxItemSize() -- note that BTREE_SINGLEVAL_FILLFACTOR is 96 for this reason. Concurrent inserters will tend to have heap TIDs that are slightly out of order, so we want to at least have enough space remaining on the left half of a "single value mode" split. We may end up with a design where deduplication anticipates what will be useful for nbtsplitloc.c. I still think that it's too early to start worrying about problems like this one -- I feel it will be useful to continue to focus on the code and the space utilization of the serial test cases for now. We can look at it at the same time that we think about adding back something like BT_COMPRESS_THRESHOLD. I am mentioning it now because it's probably a good time for you to start thinking about it, if you haven't already (actually, maybe I'm just describing what BT_COMPRESS_THRESHOLD was supposed to do in the first place). We'll need to have a good benchmark to assess these questions, and it's not obvious what that will be. Two possible candidates are TPC-H and TPC-E. (Of course, I mean running them for real -- not using their indexes to make sure that the nbtsplitloc.c stuff works well in isolation.) Any thoughts on a conventional benchmark that allows us to understand the patch's impact on both throughput and latency? BTW, I notice that we often have indexes that are quite a lot smaller when they were created with retail insertions rather than with CREATE INDEX/REINDEX. This is not new, but the difference is much larger than it typically is without the patch. For example, the TPC-E index on trade.t_ca_id (which is named "i_t_ca_id" or "i_t_ca_id2" in my test) is 162 MB with CREATE INDEX/REINDEX, and 121 MB with retail insertions (assuming the insertions use the actual order from the test). I'm not sure what to do about this, if anything. I mean, the reason that the retail insertions do better is that they have the nbtsplitloc.c stuff, and because we don't split the page until it's 100% full and until deduplication stops helping -- we could apply several rounds of deduplication before we actually have to split the cage. So the difference that we see here is both logical and surprising. How do you feel about this CREATE INDEX index-size-is-larger business? -- Peter Geoghegan