On Wed, Jul 24, 2019 at 10:46:42AM +1200, Thomas Munro wrote: > On Wed, Jul 24, 2019 at 10:42 AM Justin Pryzby <pry...@telsasoft.com> wrote: > > On Wed, Jul 24, 2019 at 10:03:25AM +1200, Thomas Munro wrote: > > > On Wed, Jul 24, 2019 at 5:42 AM Justin Pryzby <pry...@telsasoft.com> > > > wrote: > > > > #2 0x000000000085ddff in errfinish (dummy=<value optimized out>) at > > > > elog.c:555 > > > > edata = <value optimized out> > > > > > > If you have that core, it might be interesting to go to frame 2 and > > > print *edata or edata->saved_errno. > > > > As you saw..unless someone you know a trick, it's "optimized out". > > How about something like this: > > print errorData[errordata_stack_depth]
Clever. (gdb) p errordata[errordata_stack_depth] $2 = {elevel = 13986192, output_to_server = 254, output_to_client = 127, show_funcname = false, hide_stmt = false, hide_ctx = false, filename = 0x27b3790 "< %m %u >", lineno = 41745456, funcname = 0x3030313335 <Address 0x3030313335 out of bounds>, domain = 0x0, context_domain = 0x27cff90 "postgres", sqlerrcode = 0, message = 0xe8800000001 <Address 0xe8800000001 out of bounds>, detail = 0x297a <Address 0x297a out of bounds>, detail_log = 0x0, hint = 0xe88 <Address 0xe88 out of bounds>, context = 0x297a <Address 0x297a out of bounds>, message_id = 0x0, schema_name = 0x0, table_name = 0x0, column_name = 0x0, datatype_name = 0x0, constraint_name = 0x0, cursorpos = 0, internalpos = 0, internalquery = 0x0, saved_errno = 0, assoc_context = 0x0} (gdb) p errordata $3 = {{elevel = 22, output_to_server = true, output_to_client = false, show_funcname = false, hide_stmt = false, hide_ctx = false, filename = 0x9c4030 "origin.c", lineno = 591, funcname = 0x9c46e0 "CheckPointReplicationOrigin", domain = 0x9ac810 "postgres-11", context_domain = 0x9ac810 "postgres-11", sqlerrcode = 4293, message = 0x27b0e40 "could not write to file \"pg_logical/replorigin_checkpoint.tmp\": No space left on device", detail = 0x0, detail_log = 0x0, hint = 0x0, context = 0x0, message_id = 0x8a7aa8 "could not write to file \"%s\": %m", ... I ought to have remembered that it *was* in fact out of space this AM when this core was dumped (due to having not touched it since scheduling transition to this VM last week). I want to say I'm almost certain it wasn't ENOSPC in other cases, since, failing to find log output, I ran df right after the failure. But that gives me an idea: is it possible there's an issue with files being held opened by worker processes ? Including by parallel workers? Probably WALs, even after they're rotated ? If there were worker processes holding opened lots of rotated WALs, that could cause ENOSPC, but that wouldn't be obvious after they die, since the space would then be freed. Justin