FYI we have an 9.3.5 with commit_delay = 4000 and commit_siblings = 5 with
a 8TB dataset which seems fine. (Runs on different - faster hardware
though).








*Spiros Ioannou IT Manager, inAccesswww.inaccess.com
<http://www.inaccess.com>M: +30 6973-903808T: +30 210-6802-358*

On 20 July 2015 at 15:01, Andres Freund <and...@anarazel.de> wrote:

> Heikki,
>
> On 2015-07-20 13:27:12 +0200, Andres Freund wrote:
> > On 2015-07-20 13:22:42 +0200, Andres Freund wrote:
> > > Hm. The problem seems to be the WaitXLogInsertionsToFinish() call in
> > > XLogFlush().
> >
> > These are the relevant stack traces:
> > db9lock/debuglog-commit.txt
> > #2  0x00007f7405bd44f4 in LWLockWaitForVar (l=0x7f70f2ab6680,
> valptr=0x7f70f2ab66a0, oldval=<optimized out>, newval=0xffffffffffffffff)
> at
> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:1011
> > #3  0x00007f7405a0d3e6 in WaitXLogInsertionsToFinish
> (upto=121713318915952) at
> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1755
> > #4  0x00007f7405a0e1d3 in XLogFlush (record=121713318911056) at
> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:2849
> >
> > db9lock/debuglog-insert-8276.txt
> > #1  0x00007f7405b77d91 in PGSemaphoreLock (sema=0x7f73ff6531d0,
> interruptOK=0 '\000') at pg_sema.c:421
> > #2  0x00007f7405bd4849 in LWLockAcquireCommon (val=<optimized out>,
> valptr=<optimized out>, mode=<optimized out>, l=<optimized out>) at
> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:626
> > #3  LWLockAcquire (l=0x7f70ecaaa1a0, mode=LW_EXCLUSIVE) at
> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:467
> > #4  0x00007f7405a0dcca in AdvanceXLInsertBuffer (upto=<optimized out>,
> opportunistic=<optimized out>) at
> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:2161
> > #5  0x00007f7405a0e301 in GetXLogBuffer (ptr=121713318928384) at
> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1848
> > #6  0x00007f7405a0e9c9 in CopyXLogRecordToWAL (EndPos=<optimized out>,
> StartPos=<optimized out>, rdata=0x7ffff1c21b90, isLogSwitch=<optimized
> out>, write_len=<optimized out>) at
> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1494
> > #7  XLogInsert (rmid=<optimized out>, info=<optimized out>,
> rdata=<optimized out>) at /tmp/buildd/postgre
>
>
> XLogFlush() has the following comment:
>                         /*
>                          * Re-check how far we can now flush the WAL. It's
> generally not
>                          * safe to call WaitXLogInsertionsToFinish while
> holding
>                          * WALWriteLock, because an in-progress insertion
> might need to
>                          * also grab WALWriteLock to make progress. But we
> know that all
>                          * the insertions up to insertpos have already
> finished, because
>                          * that's what the earlier
> WaitXLogInsertionsToFinish() returned.
>                          * We're only calling it again to allow insertpos
> to be moved
>                          * further forward, not to actually wait for
> anyone.
>                          */
>                         insertpos = WaitXLogInsertionsToFinish(insertpos);
>
> but I don't think that's valid reasoning. WaitXLogInsertionsToFinish()
> calls LWLockWaitForVar(oldval = InvalidXLogRecPtr), which will block if
> there's a exlusive locker and some backend doesn't yet have set
> initializedUpto. Which seems like a ossible state?
>

Reply via email to