On Wed, Sep 2, 2020 at 1:57 AM Peter Geoghegan <p...@bowt.ie> wrote: > > On Wed, Aug 26, 2020 at 1:46 AM John Naylor <john.nay...@2ndquadrant.com> > wrote: > > The fact that that logic extends by 20 * numwaiters to get optimal > > performance is a red flag that resources aren't being allocated > > efficiently. > > I agree that that's pretty suspicious.
Per Simon off-list some time ago (if I understood him correctly), counting the lock waiters make the lock contention worse. I haven't tried to measure this, but I just had an idea instead to keep track of how many times the table has previously been extended by multiple blocks, and extend by a number calculated from that. Something like (pow2(2 + num-times-ext-mult-blocks)), with some ceiling perhaps much smaller than 512. Maybe a bit off topic, but the general problem you brought up has many moving parts, as you've mentioned. > > I have an idea to ignore fp_next_slot entirely if we have > > extended by multiple blocks: The backend that does the extension > > stores in the FSM root page 1) the number of blocks added and 2) the > > end-most block number. Any request for space will look for a valid > > value here first before doing the usual search. If there is then the > > block to try is based on a hash of the xid. Something like: > > > > candidate-block = prev-end-of-relation + 1 + (xid % (num-new-blocks)) > > I was thinking of doing something in shared memory, and not using the > FSM here at all. If we're really giving 20 pages out to each backend, > we will probably benefit from explicitly assigning contiguous ranges > of pages to each backend, and making some effort to respect that they > own the blocks in some general sense. That would give us flexibility and precise control. I suspect it would also have more cognitive and maintenance cost, by having more than one source of info. > Hopefully without also losing > access to the free space in corner cases (e.g. one of the backend's > has an error shortly after receiving its contiguous range of blocks). Right, you'd need some way of resetting or retiring the shared memory info when it is no longer useful. That was my thinking with the collision counter -- go back to using the FSM when we no longer have a reasonable chance of getting a fresh block. > > Also num-new-blocks above could be scaled down from the actual number > > of blocks added, just to make sure writes aren't happening all over > > the place. > > Or scaled up, perhaps. I don't think I explained this part well, so here's a concrete example. Let's say 128 blocks were added at once. Then xid % 128 would give a number to be added to the previous last block in the relation, so new target block allocations could be anywhere in this 128. By "scale down", I mean compute (say) xid % 32. That would limit deviance of spatial locality for those backends that were waiting on extension. Doing the opposite, like xid % 256, would give you blocks past the end of the relation. Further thinking out loud, after detecting enough collisions in the first 32, we could iterate through the other ranges and finally revert to conventional FSM search. -- John Naylor https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services