Hi folks, I am running a 9.0.3 Hot Standy + Streaming Replication slave which occasionally segfaults (every 1-2 days). I rebuilt Postgres with --enable-cassert and --enable-debug, switched on core dumping and waited for some results.
The first crash since enabling debugging was a failed assert in heaptuple.c: TRAP: FailedAssertion("!((data - start) == data_size)", File: "heaptuple.c", Line: 255) 2011-04-07 04:20:20 EST LOG: server process (PID 32195) was terminated by signal 6: Aborted 2011-04-07 04:20:20 EST LOG: terminating any other active server processes For context, the only things running on this server are the slave database, and a tomcat instance. The tomcat instance is the only connection into this database, which continually runs through a series of SELECTs (100ms sleep between each run). The slave database is basically stock standard 9.0.3 config, apart from the replication setup and shared_buffers increased to 2GB. Here's the backtrace: Core was generated by `postgres: backend surecast 127.0.0.1(37155) SELECT '. Program terminated with signal 6, Aborted. #0 0x00007f40a93aba75 in *__GI_raise (sig=<value optimised out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. in ../nptl/sysdeps/unix/sysv/linux/raise.c (gdb) bt #0 0x00007f40a93aba75 in *__GI_raise (sig=<value optimised out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007f40a93af5c0 in *__GI_abort () at abort.c:92 #2 0x00000000006f861d in ExceptionalCondition (conditionName=<value optimised out>, errorType=<value optimised out>, fileName=<value optimised out>, lineNumber=<value optimised out>) at assert.c:57 #3 0x0000000000459b07 in heap_form_minimal_tuple (tupleDescriptor=0xf6bdd0, values=0x8, isnull=0xf6c680 "") at heaptuple.c:1459 #4 0x0000000000580d12 in ExecFetchSlotMinimalTuple (slot=0xf6bb90) at execTuples.c:684 #5 0x0000000000588d10 in ExecHashTableInsert (hashtable=0xf4c3b0, slot=0x7dc3, hashvalue=6) at nodeHash.c:697 #6 0x0000000000589bf6 in MultiExecHash (node=<value optimised out>) at nodeHash.c:123 #7 0x000000000058a9ab in ExecHashJoin (node=0xf24008) at nodeHashjoin.c:154 #8 0x00000000005788a8 in ExecProcNode (node=0xf24008) at execProcnode.c:427 #9 0x000000000058fb21 in ExecNestLoop (node=0xf8eb98) at nodeNestloop.c:120 #10 0x00000000005788c8 in ExecProcNode (node=0xf8eb98) at execProcnode.c:419 #11 0x000000000057756d in ExecutePlan (queryDesc=0xf8bc30, direction=32195, count=0) at execMain.c:1187 #12 standard_ExecutorRun (queryDesc=0xf8bc30, direction=32195, count=0) at execMain.c:280 #13 0x0000000000642e28 in PortalRunSelect (portal=0xf11f88, forward=<value optimised out>, count=0, dest=0xe00120) at pquery.c:952 #14 0x00000000006442e9 in PortalRun (portal=<value optimised out>, count=<value optimised out>, isTopLevel=<value optimised out>, dest=<value optimised out>, altdest=<value optimised out>, completionTag=<value optimised out>) at pquery.c:796 #15 0x00000000006419e3 in exec_execute_message (argc=<value optimised out>, argv=<value optimised out>, username=<value optimised out>) at postgres.c:2003 #16 PostgresMain (argc=<value optimised out>, argv=<value optimised out>, username=<value optimised out>) at postgres.c:3988 #17 0x0000000000606a07 in BackendRun () at postmaster.c:3555 #18 BackendStartup () at postmaster.c:3242 #19 ServerLoop () at postmaster.c:1431 #20 0x000000000060931d in PostmasterMain (argc=14918336, argv=0xdfb160) at postmaster.c:1092 #21 0x00000000005a9310 in main (argc=5, argv=0xdfb140) at main.c:188 Let me know if there is any additional information I can provide. Cheers, BJ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers