On 10/28/22 9:54 PM, Andres Freund wrote:
b) I found that is quite beneficial to bulk-extend the relation with
    smgrextend() even without concurrency. The reason for that is the primarily
    the aforementioned dirty buffers that our current extension method causes.

    One bit that stumped me for quite a while is to know how much to extend the
    relation by. RelationGetBufferForTuple() drives the decision whether / how
    much to bulk extend purely on the contention on the extension lock, which
    obviously does not work for non-concurrent workloads.

    After quite a while I figured out that we actually have good information on
    how much to extend by, at least for COPY /
    heap_multi_insert(). heap_multi_insert() can compute how much space is
    needed to store all tuples, and pass that on to
    RelationGetBufferForTuple().

    For that to be accurate we need to recompute that number whenever we use an
    already partially filled page. That's not great, but doesn't appear to be a
    measurable overhead.
Some food for thought: I think it's also completely fine to extend any relation over a certain size by multiple blocks, regardless of concurrency. E.g. 10 extra blocks on an 80MB relation is 0.1%. I don't have a good feel for what algorithm would make sense here; maybe something along the lines of extend = max(relpages / 2048, 128); if extend < 8 extend = 1; (presumably extending by just a couple extra pages doesn't help much without concurrency).


Reply via email to