Re: [HACKERS] Spinlock performance improvement proposal

Bruce Momjian Sat, 29 Sep 2001 00:25:33 -0700


Good summary.  I agree checkpoint should look like as normal a Proc as
possible.



> At the just-past OSDN database conference, Bruce and I were annoyed by
> some benchmark results showing that Postgres performed poorly on an
> 8-way SMP machine.  Based on past discussion, it seems likely that the
> culprit is the known inefficiency in our spinlock implementation.
> After chewing on it for awhile, we came up with an idea for a solution.
> 
> The following proposal should improve performance substantially when
> there is contention for a lock, but it creates no portability risks
> because it uses the same system facilities (TAS and SysV semaphores)
> that we have always relied on.  Also, I think it'd be fairly easy to
> implement --- I could probably get it done in a day.
> 
> Comments anyone?
> 
>                       regards, tom lane
> 
> 
> Plan:
> 
> Replace most uses of spinlocks with "lightweight locks" (LW locks)
> implemented by a new lock manager.  The principal remaining use of true
> spinlocks (TAS locks) will be to provide mutual exclusion of access to
> LW lock structures.  Therefore, we can assume that spinlocks are never
> held for more than a few dozen instructions --- and never across a kernel
> call.
> 
> It's pretty easy to rejigger the spinlock code to work well when the lock
> is never held for long.  We just need to change the spinlock retry code
> so that it does a tight spin (continuous retry) for a few dozen cycles ---
> ideally, the total delay should be some small multiple of the max expected
> lock hold time.  If lock still not acquired, yield the CPU via a select()
> call (10 msec minimum delay) and repeat.  Although this looks inefficient,
> it doesn't matter on a uniprocessor because we expect that backends will
> only rarely be interrupted while holding the lock, so in practice a held
> lock will seldom be encountered.  On SMP machines the tight spin will win
> since the lock will normally become available before we give up and yield
> the CPU.
> 
> Desired properties of the LW lock manager include:
>       * very fast fall-through when no contention for lock
>       * waiting proc does not spin
>       * support both exclusive and shared (read-only) lock modes
>       * grant lock to waiters in arrival order (no starvation)
>       * small lock structure to allow many LW locks to exist.
> 
> Proposed contents of LW lock structure:
> 
>       spinlock mutex (protects LW lock state and PROC queue links)
>       count of exclusive holders (always 0 or 1)
>       count of shared holders (0 .. MaxBackends)
>       queue head pointer (NULL or ptr to PROC object)
>       queue tail pointer (could do without this to save space)
> 
> If a backend sees it must wait to acquire the lock, it adds its PROC
> struct to the end of the queue, releases the spinlock mutex, and then
> sleeps by P'ing its per-backend wait semaphore.  A backend releasing the
> lock will check to see if any waiter should be granted the lock.  If so,
> it will update the lock state, release the spinlock mutex, and finally V
> the wait semaphores of any backends that it decided should be released
> (which it removed from the lock's queue while holding the sema).  Notice
> that no kernel calls need be done while holding the spinlock.  Since the
> wait semaphore will remember a V occurring before P, there's no problem
> if the releaser is fast enough to release the waiter before the waiter
> reaches its P operation.
> 
> We will need to add a few fields to PROC structures:
>       * Flag to show whether PROC is waiting for an LW lock, and if so
>         whether it waits for read or write access
>       * Additional PROC queue link field.
> We can't reuse the existing queue link field because it is possible for a
> PROC to be waiting for both a heavyweight lock and a lightweight one ---
> this will occur when HandleDeadLock or LockWaitCancel tries to acquire
> the LockMgr module's lightweight lock (formerly spinlock).
> 
> It might seem that we also need to create a second wait semaphore per
> backend, one to wait on HW locks and one to wait on LW locks.  But I
> believe we can get away with just one, by recognizing that a wait for an
> LW lock can never be interrupted by a wait for a HW lock, only vice versa.
> After being awoken (V'd), the LW lock manager must check to see if it was
> actually granted the lock (easiest way: look at own PROC struct to see if
> LW lock wait flag has been cleared).  If not, the V must have been to
> grant us a HW lock --- but we still have to sleep to get the LW lock.  So
> remember this happened, then loop back and P again.  When we finally get
> the LW lock, if there was an extra P operation then V the semaphore once
> before returning.  This will allow ProcSleep to exit the wait for the HW
> lock when we return to it.
> 
> Fine points:
> 
> While waiting for an LW lock, we need to show in our PROC struct whether
> we are waiting for read or write access.  But we don't need to remember
> this after getting the lock; if we know we have the lock, it's easy to
> see by inspecting the lock whether we hold read or write access.
> 
> ProcStructLock cannot be replaced by an LW lock, since a backend cannot
> use an LW lock until it has obtained a PROC struct and a semaphore,
> both of which are protected by this lock.  It seems okay to use a plain
> spinlock for this purpose.  NOTE: it's okay for SInvalLock to be an LW
> lock, as long as the LW mgr does not depend on accessing the SI array
> of PROC objects, but only chains through the PROCs themselves.
> 
> Another tricky point is that some of the setup code executed by the
> postmaster may try to to grab/release LW locks.  Here, we can probably
> allow a special case for MyProc=NULL.  It's likely that we should never
> see a block under these circumstances anyway, so finding MyProc=NULL when
> we need to block may just be a fatal error condition.
> 
> A nastier case is checkpoint processes; these expect to grab BufMgr and
> WAL locks.  Perhaps okay for them to do plain sleeps in between attempts
> to grab the locks?  This says that the MyProc=NULL case should release
> the spinlock mutex, sleep 10 msec, try again, rather than any sort of error
> or expectation of no conflict.  Are there any cases where this represents
> a horrid performance loss?  Checkpoint itself seems noncritical.
> 
> Alternative is for checkpoint to be allowed to create a PROC struct (but
> not to enter it in SI list) so's it can participate normally in LW lock
> operations.  That seems a good idea anyway, actually, so that the PROC
> struct's facility for releasing held LW locks at elog time will work
> inside the checkpointer.  (But that means we need an extra sema too?
> Okay, but don't want an extra would-be backend to obtain the extra sema
> and perhaps cause a checkpoint proc to fail.  So must allocate the PROC
> and sema for checkpoint process separately from those reserved for
> backends.)
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
> 

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  [EMAIL PROTECTED]               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Spinlock performance improvement proposal

Reply via email to