On Tue, Jul 17, 2018 at 3:10 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Well, the actual problem was O(N^2) behavior: > > https://www.postgresql.org/message-id/2378.967216388%40sss.pgh.pa.us > > https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=40549e9cb5abd2986603883e4ab567dab34723c6
Oh, yeah. We still put the new tuple to the left of all existing duplicates on the leaf that we decide to insert on, because "we'll insert it before any existing equal keys because of the way _bt_binsrch() works". I actually plan on mostly fixing that aspect of _bt_binsrch() while I'm at it, which is why I didn't think to frame it that way. It is claimed that we need to do this _bt_binsrch() go-left-on-equal thing for internal pages because we allow duplicates, contrary to L&Y. I find it much easier to think of it as being necessary due to index scans where only some columns are specified in the initial insertion scan key that finds the initial leaf page (i.e. the _bt_first() stuff). I'm not going to touch the _bt_binsrch() go-left-on-equal thing, because it's actually necessary for any real world implementation that has multiple columns in tuples. Fortunately, we can usually avoid the go-left-on-equal thing for internal pages by avoiding equality -- a sentinel heap scan TID value may be used for this in many cases. The Lehman and Yao paper is badly written. They should have talked about the _bt_binsrch() go-left-on-equal thing, rather than leaving it to the reader to figure it out. It's not why L&Y require unique keys, though. Sorry if all this detail isn't useful right now. The important point is that I actually think that this code gets most things right already. > I certainly have no objection to improving matters, but let's be sure > we don't re-introduce any two-decade-old problems. A mountain of work remains to validate the performance of the patch. -- Peter Geoghegan