On Fri, May 10, 2019 at 11:27:07AM -0700, Andres Freund wrote: > Hi, > > On 2019-05-10 11:38:57 -0400, Tom Lane wrote: > > Core was generated by `postgres: debian regression [local] SELECT > > '. > > Program terminated with signal SIGSEGV, Segmentation fault. > > #0 sysmalloc (nb=8208, av=0x3fff916e0d28 <main_arena>) at malloc.c:2748 > > 2748 malloc.c: No such file or directory. > > #0 sysmalloc (nb=8208, av=0x3fff916e0d28 <main_arena>) at malloc.c:2748 > > #1 0x00003fff915bedc8 in _int_malloc (av=0x3fff916e0d28 <main_arena>, > > bytes=8192) at malloc.c:3865 > > #2 0x00003fff915c1064 in __GI___libc_malloc (bytes=8192) at malloc.c:2928 > > #3 0x00000000106acfd8 in AllocSetContextCreateInternal > > (parent=0x1000babdad0, name=0x1085508c "inline_function", > > minContextSize=<optimized out>, initBlockSize=<optimized out>, > > maxBlockSize=8388608) at aset.c:477 > > #4 0x00000000103d5e00 in inline_function (funcid=20170, > > result_type=<optimized out>, result_collid=<optimized out>, > > input_collid=<optimized out>, funcvariadic=<optimized out>, > > func_tuple=<optimized out>, context=0x3fffe3da15d0, args=<optimized out>) > > at clauses.c:4459 > > #5 simplify_function (funcid=<optimized out>, result_type=<optimized out>, > > result_typmod=<optimized out>, result_collid=<optimized out>, > > input_collid=<optimized out>, args_p=<optimized out>, > > funcvariadic=<optimized out>, process_args=<optimized out>, > > allow_non_const=<optimized out>, context=<optimized out>) at clauses.c:4040 > > #6 0x00000000103d2e74 in eval_const_expressions_mutator > > (node=0x1000babe968, context=0x3fffe3da15d0) at clauses.c:2474 > > #7 0x00000000103511bc in expression_tree_mutator (node=<optimized out>, > > mutator=0x103d2b10 <eval_const_expressions_mutator>, > > context=0x3fffe3da15d0) at nodeFuncs.c:2893 > > > > So that lets out any theory that somehow we're getting into a weird > > control path that misses calling check_stack_depth; > > expression_tree_mutator does so for one, and it was called just nine > > stack frames down from the crash. > > Right. There's plenty places checking it... > > > > I am wondering if, somehow, the stack depth limit seen by the postmaster > > sometimes doesn't apply to its children. That would be pretty wacko > > kernel behavior, especially if it's only intermittently true. > > But we're running out of other explanations. > > I wonder if this is a SIGSEGV that actually signals an OOM > situation. Linux, if it can't actually extend the stack on-demand due to > OOM, sends a SIGSEGV. The signal has that information, but > unfortunately the buildfarm code doesn't print it. p $_siginfo would > show us some of that... > > Mark, how tight is the memory on that machine?
There's about 2GB allocated: debian@postgresql-debian:~$ cat /proc/meminfo MemTotal: 2080704 kB MemFree: 1344768 kB MemAvailable: 1824192 kB At the moment it looks like plenty. :) Maybe I should set something up to monitor these things. > Does dmesg have any other > information (often segfaults are logged by the kernel with the code > IIRC). It's been up for about 49 days: debian@postgresql-debian:~$ uptime 14:54:30 up 49 days, 14:59, 3 users, load average: 0.00, 0.34, 1.04 I see one line from dmesg that is related to postgres: [3939350.616849] postgres[17057]: bad frame in setup_rt_frame: 00003fffe3d9fe00 nip 00003fff915bdba0 lr 00003fff915bde9c But only that one time in 49 days up. Otherwise I see a half dozen hung_task_timeout_secs messages around jdb2 and dhclient. Regards, Mark -- Mark Wong 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/