Re: Atomic operations within spinlocks

Andres Freund Thu, 04 Jun 2020 20:04:26 -0700

Hi,

On 2020-06-04 15:13:29 -0400, Tom Lane wrote:
> Andres Freund <and...@anarazel.de> writes:
> > On 2020-06-04 14:50:40 -0400, Tom Lane wrote:
> >> 2. The computed completePasses value would go backwards.  I bet
> >> that wouldn't matter too much either, or at least we could teach
> >> BgBufferSync to cope.  (I notice the comments therein suggest that
> >> it is already designed to cope with completePasses wrapping around,
> >> so maybe nothing needs to be done.)
>
> > If we're not concerned about that, then we can remove the
> > atomic-inside-spinlock, I think. The only reason for that right now is
> > to avoid assuming a wrong pass number.
>
> Hmm.  That might be a less-invasive path to a solution.  I can take
> a look, if you don't want to.


First, I think it would be problematic:

        /*
         * Find out where the freelist clock sweep currently is, and how many
         * buffer allocations have happened since our last call.
         */
        strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc);
...

        /*
         * Compute strategy_delta = how many buffers have been scanned by the
         * clock sweep since last time.  If first time through, assume none. 
Then
         * see if we are still ahead of the clock sweep, and if so, how many
         * buffers we could scan before we'd catch up with it and "lap" it. 
Note:
         * weird-looking coding of xxx_passes comparisons are to avoid bogus
         * behavior when the passes counts wrap around.
         */
        if (saved_info_valid)
        {
                int32           passes_delta = strategy_passes - 
prev_strategy_passes;

                strategy_delta = strategy_buf_id - prev_strategy_buf_id;
                strategy_delta += (long) passes_delta * NBuffers;

                Assert(strategy_delta >= 0);

ISTM that if we can get an out-of-sync strategy_passes and
strategy_buf_id we'll end up with a pretty wrong strategy_delta. Which,
I think, can cause to reset bgwriter's position:
                else
                {
                        /*
                         * We're behind, so skip forward to the strategy point 
and start
                         * cleaning from there.
                         */
#ifdef BGW_DEBUG
                        elog(DEBUG2, "bgwriter behind: bgw %u-%u strategy %u-%u 
delta=%ld",
                                 next_passes, next_to_clean,
                                 strategy_passes, strategy_buf_id,
                                 strategy_delta);
#endif
                        next_to_clean = strategy_buf_id;
                        next_passes = strategy_passes;
                        bufs_to_lap = NBuffers;
                }


While I think that the whole logic in BgBufferSync doesn't make a whole
lot of sense, it does seem to me this has a fair potential to make it
worse. In a scenario with a decent cache hit ratio (leading to high
usagecounts) and a not that large NBuffers, we can end up up doing quite
a few passes (as in many a second), so it might not be that hard to hit
this.


I am not immediately coming up with a cheap solution that doesn't do the
atomics-within-spinlock thing. The best I can come up with is using a
64bit atomic, with the upper 32bit containing the number of passes, and
the lower 32bit containing the current buffer. Where the lower 32bit /
the buffer is handled like it currently is, i.e. we "try" to keep it
below NBuffers. So % is only used for the "cold" path. That'd just add a
64->32 bit cast in the hot path, which shouldn't be measurable. But it'd
regress platforms without 64bit atomics substantially.

We could obviously also just rewrite the BgBufferSync() logic, so it
doesn't care about things like "lapping", but that's not an easy change.


So the best I can really suggest, unless we were to agree on atomics
being ok inside spinlocks, is probably to just replace the spinlock with
an lwlock. That'd perhaps cause a small slowdown for a few cases, but
it'd make workload that e.g. use the freelist a lot (e.g. when tables
are dropped regularly) scale better.

Greetings,

Andres Freund

Re: Atomic operations within spinlocks

Reply via email to