Hi,

while stress-testing a physical replication to test another patch, I've
repeatedly hit this assert in GetSnapshotDataReuse:

  Assert(TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin));

AFAICS the backtrace always looks exactly the same - a couple frames
from the top:

#4  0x0000000000b5af36 in ExceptionalCondition (conditionName=0xe2a470
"TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin)",
fileName=0xe29768 "procarray.c",
    lineNumber=2132) at assert.c:66
#5  0x000000000094c2c3 in GetSnapshotDataReuse (snapshot=0x1079380
<CatalogSnapshotData>) at procarray.c:2132
#6  0x000000000094c4b0 in GetSnapshotData (snapshot=0x1079380
<CatalogSnapshotData>) at procarray.c:2233
#7  0x0000000000bb31f8 in GetNonHistoricCatalogSnapshot (relid=2615) at
snapmgr.c:412
#8  0x0000000000bb31a2 in GetCatalogSnapshot (relid=2615) at snapmgr.c:385
#9  0x0000000000493073 in systable_beginscan
(heapRelation=0x76cee3dc01d8, indexId=2684, indexOK=true, snapshot=0x0,
nkeys=1, key=0x7ffcaddd2f30) at genam.c:414
#10 0x0000000000b383c0 in SearchCatCacheMiss (cache=0xa99a880, nkeys=1,
hashValue=2798280003, hashIndex=3, v1=130630962093252, v2=0, v3=0, v4=0)
at catcache.c:1533
#11 0x0000000000b38290 in SearchCatCacheInternal (cache=0xa99a880,
nkeys=1, v1=130630962093252, v2=0, v3=0, v4=0) at catcache.c:1464
#12 0x0000000000b37f69 in SearchCatCache (cache=0xa99a880,
v1=130630962093252, v2=0, v3=0, v4=0) at catcache.c:1318
#13 0x0000000000b53df5 in SearchSysCache (cacheId=37,
key1=130630962093252, key2=0, key3=0, key4=0) at syscache.c:217
#14 0x0000000000b54318 in GetSysCacheOid (cacheId=37, oidcol=1,
key1=130630962093252, key2=0, key3=0, key4=0) at syscache.c:459

At first I assumed it's something my patch broke, but then I tried the
stress test on master, and I hit that assert again, so it's a
pre-existing issue.

The stress test is fairly simple - it creates a primary/standby cluster,
and then does pgbench on both nodes (read-write on primary, read-only on
standby), and also randomly restarts both nodes (in fast or immediate
modes). The script I use is attached.

It takes a while to hit it - on my laptop it takes an hour or so, but I
guess it's more about the random sleeps in the script.

I've only ever seen this on the standby, never on the primary.


regards

-- 
Tomas Vondra
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (threadid=<optimized out>, 
signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44            return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO 
(ret) : 0;
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, 
signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x000076cee428b183 in __pthread_kill_internal (threadid=<optimized out>, 
signo=6) at pthread_kill.c:78
#2  0x000076cee4231f9e in __GI_raise (sig=sig@entry=6) at 
../sysdeps/posix/raise.c:26
#3  0x000076cee4219942 in __GI_abort () at abort.c:79
#4  0x0000000000b5af36 in ExceptionalCondition (conditionName=0xe2a470 
"TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin)", fileName=0xe29768 
"procarray.c", 
    lineNumber=2132) at assert.c:66
#5  0x000000000094c2c3 in GetSnapshotDataReuse (snapshot=0x1079380 
<CatalogSnapshotData>) at procarray.c:2132
#6  0x000000000094c4b0 in GetSnapshotData (snapshot=0x1079380 
<CatalogSnapshotData>) at procarray.c:2233
#7  0x0000000000bb31f8 in GetNonHistoricCatalogSnapshot (relid=2615) at 
snapmgr.c:412
#8  0x0000000000bb31a2 in GetCatalogSnapshot (relid=2615) at snapmgr.c:385
#9  0x0000000000493073 in systable_beginscan (heapRelation=0x76cee3dc01d8, 
indexId=2684, indexOK=true, snapshot=0x0, nkeys=1, key=0x7ffcaddd2f30) at 
genam.c:414
#10 0x0000000000b383c0 in SearchCatCacheMiss (cache=0xa99a880, nkeys=1, 
hashValue=2798280003, hashIndex=3, v1=130630962093252, v2=0, v3=0, v4=0) at 
catcache.c:1533
#11 0x0000000000b38290 in SearchCatCacheInternal (cache=0xa99a880, nkeys=1, 
v1=130630962093252, v2=0, v3=0, v4=0) at catcache.c:1464
#12 0x0000000000b37f69 in SearchCatCache (cache=0xa99a880, v1=130630962093252, 
v2=0, v3=0, v4=0) at catcache.c:1318
#13 0x0000000000b53df5 in SearchSysCache (cacheId=37, key1=130630962093252, 
key2=0, key3=0, key4=0) at syscache.c:217
#14 0x0000000000b54318 in GetSysCacheOid (cacheId=37, oidcol=1, 
key1=130630962093252, key2=0, key3=0, key4=0) at syscache.c:459
#15 0x000000000054ba84 in get_namespace_oid (nspname=0x76cee4179cc4 "user", 
missing_ok=true) at namespace.c:3539
#16 0x000000000054ca87 in preprocessNamespacePath (searchPath=0xa9486b0 
"\"$user\", public", roleid=10, temp_missing=0xa958ebc) at namespace.c:4150
#17 0x000000000054cdb0 in cachedNamespacePath (searchPath=0xa9486b0 "\"$user\", 
public", roleid=10) at namespace.c:4261
#18 0x000000000054ce9f in recomputeNamespacePath () at namespace.c:4309
#19 0x0000000000547d24 in RelnameGetRelid (relname=0xa92d708 
"pgbench_accounts") at namespace.c:890
#20 0x00000000005474cc in RangeVarGetRelidExtended (relation=0xa92d738, 
lockmode=1, flags=1, callback=0x0, callback_arg=0x0) at namespace.c:539
#21 0x000000000041d7d7 in relation_openrv_extended (relation=0xa92d738, 
lockmode=1, missing_ok=true) at relation.c:185
#22 0x00000000004d509d in table_openrv_extended (relation=0xa92d738, 
lockmode=1, missing_ok=true) at table.c:108
#23 0x00000000005d220c in parserOpenTable (pstate=0xa92db30, 
relation=0xa92d738, lockmode=1) at parse_relation.c:1441
#24 0x00000000005d2436 in addRangeTableEntry (pstate=0xa92db30, 
relation=0xa92d738, alias=0x0, inh=true, inFromCl=true) at parse_relation.c:1518
#25 0x00000000005ab7ec in transformTableEntry (pstate=0xa92db30, r=0xa92d738) 
at parse_clause.c:398
#26 0x00000000005ad252 in transformFromClauseItem (pstate=0xa92db30, 
n=0xa92d738, top_nsitem=0x7ffcaddd37c8, namespace=0x7ffcaddd37c0) at 
parse_clause.c:1073
#27 0x00000000005ab0e9 in transformFromClause (pstate=0xa92db30, 
frmList=0xa92d788) at parse_clause.c:131
#28 0x0000000000578644 in transformSelectStmt (pstate=0xa92db30, 
stmt=0xa92d9a0) at analyze.c:1458
#29 0x0000000000576a03 in transformStmt (pstate=0xa92db30, parseTree=0xa92d9a0) 
at analyze.c:458
#30 0x000000000057687c in transformOptionalSelectInto (pstate=0xa92db30, 
parseTree=0xa92d9a0) at analyze.c:388
#31 0x0000000000576775 in transformTopLevelStmt (pstate=0xa92db30, 
parseTree=0xa92dab0) at analyze.c:341
#32 0x00000000005762cc in parse_analyze_fixedparams (parseTree=0xa92dab0, 
sourceText=0xa92cbc0 "SELECT abalance FROM pgbench_accounts WHERE aid = 
59119;", paramTypes=0x0, 
    numParams=0, queryEnv=0x0) at analyze.c:122
#33 0x000000000097ff25 in pg_analyze_and_rewrite_fixedparams 
(parsetree=0xa92dab0, query_string=0xa92cbc0 "SELECT abalance FROM 
pgbench_accounts WHERE aid = 59119;", 
    paramTypes=0x0, numParams=0, queryEnv=0x0) at postgres.c:682
#34 0x00000000009808bd in exec_simple_query (query_string=0xa92cbc0 "SELECT 
abalance FROM pgbench_accounts WHERE aid = 59119;") at postgres.c:1188
#35 0x000000000098556f in PostgresMain (dbname=0xa967320 "test_1", 
username=0xa967308 "user") at postgres.c:4726
#36 0x000000000097cbb3 in BackendMain (startup_data=0x7ffcaddd3cc0, 
startup_data_len=24) at backend_startup.c:122
#37 0x00000000008a06d7 in postmaster_child_launch (child_type=B_BACKEND, 
child_slot=9, startup_data=0x7ffcaddd3cc0, startup_data_len=24, 
client_sock=0x7ffcaddd3d10)
    at launch_backend.c:288
#38 0x00000000008a6461 in BackendStartup (client_sock=0x7ffcaddd3d10) at 
postmaster.c:3524
--Type <RET> for more, q to quit, c to continue without paging--
#39 0x00000000008a3f43 in ServerLoop () at postmaster.c:1687
#40 0x00000000008a3912 in PostmasterMain (argc=3, argv=0xa926860) at 
postmaster.c:1385
#41 0x000000000075aedf in main (argc=3, argv=0xa926860) at main.c:230

Attachment: test2.sh
Description: application/shellscript

Reply via email to