On Tue, Jul 2, 2019 at 3:51 PM Peter Geoghegan <p...@bowt.ie> wrote: > I've already written a rough patch that fixes the issue by taking this > second view of the problem. The patch makes nbtsplitloc.c more > skeptical about finishing with the "many duplicates" strategy, > avoiding the problem -- it can just fall back on a 50:50 page split > when it looks like this is happening (the related "single value" > strategy must already so something similar in _bt_strategy()). > Currently, it simply considers if the new item on the page has an > offset number immediately to the right of the split point indicated by > the "many duplicates" strategy. We look for it within ~10 offset > positions to the right, since that strongly suggests that there aren't > that many duplicates after all.
Attached draft patch shows what I have in mind. I can't think of another case that will make nbtsplitloc.c do the wrong thing, so I am cautiously optimistic about this being the last we'll hear about cases where we *consistently* do the wrong thing because somebody got very unlucky *once*. I continue to maintain the test suite used to develop the v12 enhancements to nbtree. These are mostly smoke tests which take a long time to run, but there are a few particularly ticklish behaviors that merit inclusion in the standard regression test suite. I wonder if it would make sense to add some tests of the new nbtsplitloc.c behaviors to the regression tests of contrib/pgstattuple, including the behavior that this patch is concerned with, as well as the "split after new tuple" behavior -- we could do something with pgstatindex()'s avg_leaf_density field to make that work. These tests would need to work in a portable fashion, while still being effective as tests, but that shouldn't be too difficult. The leaf space utilization very often looks *identical* to what you'll see with rightmost page splits when the "split after new tuple" optimization is applied, for example. The tests will need to be tolerant of variations in page layout due to alignment and BLCKSZ differences, but the tolerance can probably be quite small. Maybe +/- 5%. -- Peter Geoghegan
0002-Fix-pathological-page-split-issue.patch
Description: Binary data