On Fri, May 10, 2019 at 11:27:07AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2019-05-10 11:38:57 -0400, Tom Lane wrote:
> > Core was generated by `postgres: debian regression [local] SELECT           
> >                           '.
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  sysmalloc (nb=8208, av=0x3fff916e0d28 <main_arena>) at malloc.c:2748
> > 2748        malloc.c: No such file or directory.
> > #0  sysmalloc (nb=8208, av=0x3fff916e0d28 <main_arena>) at malloc.c:2748
> > #1  0x00003fff915bedc8 in _int_malloc (av=0x3fff916e0d28 <main_arena>, 
> > bytes=8192) at malloc.c:3865
> > #2  0x00003fff915c1064 in __GI___libc_malloc (bytes=8192) at malloc.c:2928
> > #3  0x00000000106acfd8 in AllocSetContextCreateInternal 
> > (parent=0x1000babdad0, name=0x1085508c "inline_function", 
> > minContextSize=<optimized out>, initBlockSize=<optimized out>, 
> > maxBlockSize=8388608) at aset.c:477
> > #4  0x00000000103d5e00 in inline_function (funcid=20170, 
> > result_type=<optimized out>, result_collid=<optimized out>, 
> > input_collid=<optimized out>, funcvariadic=<optimized out>, 
> > func_tuple=<optimized out>, context=0x3fffe3da15d0, args=<optimized out>) 
> > at clauses.c:4459
> > #5  simplify_function (funcid=<optimized out>, result_type=<optimized out>, 
> > result_typmod=<optimized out>, result_collid=<optimized out>, 
> > input_collid=<optimized out>, args_p=<optimized out>, 
> > funcvariadic=<optimized out>, process_args=<optimized out>, 
> > allow_non_const=<optimized out>, context=<optimized out>) at clauses.c:4040
> > #6  0x00000000103d2e74 in eval_const_expressions_mutator 
> > (node=0x1000babe968, context=0x3fffe3da15d0) at clauses.c:2474
> > #7  0x00000000103511bc in expression_tree_mutator (node=<optimized out>, 
> > mutator=0x103d2b10 <eval_const_expressions_mutator>, 
> > context=0x3fffe3da15d0) at nodeFuncs.c:2893
> 
> 
> > So that lets out any theory that somehow we're getting into a weird
> > control path that misses calling check_stack_depth;
> > expression_tree_mutator does so for one, and it was called just nine
> > stack frames down from the crash.
> 
> Right. There's plenty places checking it...
> 
> 
> > I am wondering if, somehow, the stack depth limit seen by the postmaster
> > sometimes doesn't apply to its children.  That would be pretty wacko
> > kernel behavior, especially if it's only intermittently true.
> > But we're running out of other explanations.
> 
> I wonder if this is a SIGSEGV that actually signals an OOM
> situation. Linux, if it can't actually extend the stack on-demand due to
> OOM, sends a SIGSEGV.  The signal has that information, but
> unfortunately the buildfarm code doesn't print it.  p $_siginfo would
> show us some of that...
> 
> Mark, how tight is the memory on that machine?

There's about 2GB allocated:

debian@postgresql-debian:~$ cat /proc/meminfo
MemTotal:        2080704 kB
MemFree:         1344768 kB
MemAvailable:    1824192 kB


At the moment it looks like plenty. :)  Maybe I should set something up
to monitor these things.

> Does dmesg have any other
> information (often segfaults are logged by the kernel with the code
> IIRC).

It's been up for about 49 days:

debian@postgresql-debian:~$ uptime
 14:54:30 up 49 days, 14:59,  3 users,  load average: 0.00, 0.34, 1.04


I see one line from dmesg that is related to postgres:

[3939350.616849] postgres[17057]: bad frame in setup_rt_frame: 00003fffe3d9fe00 
nip 00003fff915bdba0 lr 00003fff915bde9c


But only that one time in 49 days up.  Otherwise I see a half dozen
hung_task_timeout_secs messages around jdb2 and dhclient.

Regards,
Mark

-- 
Mark Wong
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/


Reply via email to