On Fri, Jun 26, 2015 at 11:40 AM, Heikki Linnakangas <hlinn...@iki.fi> wrote:
> On 06/26/2015 08:02 PM, Jeff Janes wrote: > >> Under high load against f7bb7f0625771bc71869cda, I occasionally get: >> >> PANIC: XLogBeginInsert was not called >> >> It seems to only come from vacuuming. >> >> Here is an example back-trace: >> >> #0 0x0000003dcb632625 in raise () from /lib64/libc.so.6 >> #1 0x0000003dcb633e05 in abort () from /lib64/libc.so.6 >> #2 0x000000000079a39c in errfinish (dummy=<value optimized out>) at >> elog.c:551 >> #3 0x000000000079b0e4 in elog_finish (elevel=<value optimized out>, >> fmt=<value optimized out>) at elog.c:1368 >> #4 0x00000000004dd1fe in XLogInsert (rmid=13 '\r', info=48 '0') at >> xloginsert.c:412 >> #5 0x0000000000478e13 in ginPlaceToPage (btree=0x7fff1fbb7f60, >> stack=0x2878760, insertdata=<value optimized out>, updateblkno=<value >> optimized out>, >> childbuf=0, buildStats=<value optimized out>) at ginbtree.c:582 >> #6 0x0000000000479a33 in ginInsertValue (btree=0x7fff1fbb7f60, >> stack=0x2878760, insertdata=0x7fff1fbb7fe0, buildStats=0x0) at >> ginbtree.c:751 >> #7 0x000000000047364a in ginEntryInsert (ginstate=0x7fff1fbb8280, >> attnum=54624, key=1, category=<value optimized out>, items=0x287d3c0, >> nitem=1, >> buildStats=0x0) at gininsert.c:234 >> #8 0x000000000047ef4b in ginInsertCleanup (ginstate=0x7fff1fbb8280, >> vac_delay=<value optimized out>, stats=0x2868d90) at ginfast.c:843 >> #9 0x000000000047e024 in ginbulkdelete (fcinfo=<value optimized out>) at >> ginvacuum.c:547 >> >> >> From the code, it seems obvious that XLogBeginInsert is getting called at >> ginbtree.c line 373, so I think that some obscure code path of >> btree->placeToPage must be either consuming or resetting that >> XLogBeginInsert before it returns control to ginbtree.c >> > > Here's a theory: > > First of all, you have checksums or wal_log_hints enabled. > Correct, checksums are on. I forgot all about that. > > The page is being split (that's evident from "info=48" above). > ginPlaceToPage calls GinNewBuffer, which calls GetFreeIndexPage(). That > finds a page that can be recycled, and marks it as used. > RecordUsedIndexPage calls MarkBufferDirtyHint(), which in turn calls > XLogSaveBufferForHint() to create a full-page image record of the page. > That calls XLogBeginInsert() + XLogInsert(), and leaves the > begininsert_called == false. > > If you had assertions enabled, you'd see the assertion in > XLogBeginInsert() to fail. > > I'll look into that over the weekend.. > > Should the assertion in XLogBeginInsert() be a elog(FATAL,...) instead? The performance impact should be negligible (touches process-local memory which is already being touched a few instructions later) and it will produce better bug reports, and backtraces, should there be more issues. I thought it would takes days to reproduce this with a enable-cassert build, but it actually reproduced much faster that way. Your theory looks completely correct. #0 0x0000003dcb632625 in raise () from /lib64/libc.so.6 #1 0x0000003dcb633e05 in abort () from /lib64/libc.so.6 #2 0x00000000007d18bd in ExceptionalCondition (conditionName=<value optimized out>, errorType=<value optimized out>, fileName=<value optimized out>, lineNumber=<value optimized out>) at assert.c:54 #3 0x00000000004f8067 in XLogBeginInsert () at xloginsert.c:125 #4 0x00000000004f83d5 in XLogSaveBufferForHint (buffer=241, buffer_std=<value optimized out>) at xloginsert.c:907 #5 0x00000000006b64af in MarkBufferDirtyHint (buffer=241, buffer_std=<value optimized out>) at bufmgr.c:3085 #6 0x00000000006be859 in fsm_set_and_search (rel=<value optimized out>, addr=..., slot=1160, newValue=0 '\000', minValue=0 '\000') at freespace.c:621 #7 0x00000000006be8f6 in RecordPageWithFreeSpace (rel=0x7f01e7fe0560, heapBlk=1160, spaceAvail=<value optimized out>) at freespace.c:188 #8 0x00000000006bf105 in GetFreeIndexPage (rel=0x7f01e7fe0560) at indexfsm.c:43 #9 0x00000000004750b2 in GinNewBuffer (index=0x7f01e7fe0560) at ginutil.c:218 #10 0x000000000047d5c4 in ginPlaceToPage (btree=0x7fffe3a55840, stack=0x1307b78, insertdata=<value optimized out>, updateblkno=<value optimized out>, childbuf=0, buildStats=0x0) at ginbtree.c:442 #11 0x000000000047f2dd in ginInsertValue (btree=0x7fffe3a55840, stack=0x1307b78, insertdata=0x7fffe3a558c0, buildStats=0x0) at ginbtree.c:751 #12 0x0000000000475b7b in ginEntryInsert (ginstate=0x7fffe3a55b80, attnum=58216, key=1, category=<value optimized out>, items=0x7f01e7f61120, nitem=11, buildStats=0x0) at gininsert.c:234 #13 0x0000000000485dfc in ginInsertCleanup (ginstate=0x7fffe3a55b80, vac_delay=<value optimized out>, stats=0x12bbbc8) at ginfast.c:843 #14 0x0000000000484d83 in ginbulkdelete (fcinfo=<value optimized out>) at ginvacuum.c:547 Cheers, Jeff