FYI we have an 9.3.5 with commit_delay = 4000 and commit_siblings = 5 with a 8TB dataset which seems fine. (Runs on different - faster hardware though).
*Spiros Ioannou IT Manager, inAccesswww.inaccess.com <http://www.inaccess.com>M: +30 6973-903808T: +30 210-6802-358* On 20 July 2015 at 15:01, Andres Freund <and...@anarazel.de> wrote: > Heikki, > > On 2015-07-20 13:27:12 +0200, Andres Freund wrote: > > On 2015-07-20 13:22:42 +0200, Andres Freund wrote: > > > Hm. The problem seems to be the WaitXLogInsertionsToFinish() call in > > > XLogFlush(). > > > > These are the relevant stack traces: > > db9lock/debuglog-commit.txt > > #2 0x00007f7405bd44f4 in LWLockWaitForVar (l=0x7f70f2ab6680, > valptr=0x7f70f2ab66a0, oldval=<optimized out>, newval=0xffffffffffffffff) > at > /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:1011 > > #3 0x00007f7405a0d3e6 in WaitXLogInsertionsToFinish > (upto=121713318915952) at > /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1755 > > #4 0x00007f7405a0e1d3 in XLogFlush (record=121713318911056) at > /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:2849 > > > > db9lock/debuglog-insert-8276.txt > > #1 0x00007f7405b77d91 in PGSemaphoreLock (sema=0x7f73ff6531d0, > interruptOK=0 '\000') at pg_sema.c:421 > > #2 0x00007f7405bd4849 in LWLockAcquireCommon (val=<optimized out>, > valptr=<optimized out>, mode=<optimized out>, l=<optimized out>) at > /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:626 > > #3 LWLockAcquire (l=0x7f70ecaaa1a0, mode=LW_EXCLUSIVE) at > /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:467 > > #4 0x00007f7405a0dcca in AdvanceXLInsertBuffer (upto=<optimized out>, > opportunistic=<optimized out>) at > /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:2161 > > #5 0x00007f7405a0e301 in GetXLogBuffer (ptr=121713318928384) at > /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1848 > > #6 0x00007f7405a0e9c9 in CopyXLogRecordToWAL (EndPos=<optimized out>, > StartPos=<optimized out>, rdata=0x7ffff1c21b90, isLogSwitch=<optimized > out>, write_len=<optimized out>) at > /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1494 > > #7 XLogInsert (rmid=<optimized out>, info=<optimized out>, > rdata=<optimized out>) at /tmp/buildd/postgre > > > XLogFlush() has the following comment: > /* > * Re-check how far we can now flush the WAL. It's > generally not > * safe to call WaitXLogInsertionsToFinish while > holding > * WALWriteLock, because an in-progress insertion > might need to > * also grab WALWriteLock to make progress. But we > know that all > * the insertions up to insertpos have already > finished, because > * that's what the earlier > WaitXLogInsertionsToFinish() returned. > * We're only calling it again to allow insertpos > to be moved > * further forward, not to actually wait for > anyone. > */ > insertpos = WaitXLogInsertionsToFinish(insertpos); > > but I don't think that's valid reasoning. WaitXLogInsertionsToFinish() > calls LWLockWaitForVar(oldval = InvalidXLogRecPtr), which will block if > there's a exlusive locker and some backend doesn't yet have set > initializedUpto. Which seems like a ossible state? >