I see that the segfault is under active discussion but just wanted to ask
if increasing the max_connections to mitigate the DSM slots shortage is the
way to go?



--
regards,
Jakub Glapa

On Mon, Nov 27, 2017 at 11:48 PM, Thomas Munro <
thomas.mu...@enterprisedb.com> wrote:

> On Tue, Nov 28, 2017 at 10:05 AM, Jakub Glapa <jakub.gl...@gmail.com>
> wrote:
> > As for the crash. I dug up the initial log and it looks like a
> segmentation
> > fault...
> >
> > 2017-11-23 07:26:53 CET:192.168.10.83(35238):user@db:[30003]: ERROR:
> too
> > many dynamic shared memory segments
>
> Hmm.  Well this error can only occur in dsm_create() called without
> DSM_CREATE_NULL_IF_MAXSEGMENTS.  parallel.c calls it with that flag
> and dsa.c doesn't (perhaps it should, not sure, but that'd just change
> the error message), so that means this the error arose from dsa.c
> trying to get more segments.  That would be when Parallel Bitmap Heap
> Scan tried to allocate memory.
>
> I hacked my copy of PostgreSQL so that it allows only 5 DSM slots and
> managed to reproduce a segv crash by trying to run concurrent Parallel
> Bitmap Heap Scans.  The stack looks like this:
>
>   * frame #0: 0x00000001083ace29
> postgres`alloc_object(area=0x0000000000000000, size_class=10) + 25 at
> dsa.c:1433
>     frame #1: 0x00000001083acd14
> postgres`dsa_allocate_extended(area=0x0000000000000000, size=72,
> flags=4) + 1076 at dsa.c:785
>     frame #2: 0x0000000108059c33
> postgres`tbm_prepare_shared_iterate(tbm=0x00007f9743027660) + 67 at
> tidbitmap.c:780
>     frame #3: 0x0000000108000d57
> postgres`BitmapHeapNext(node=0x00007f9743019c88) + 503 at
> nodeBitmapHeapscan.c:156
>     frame #4: 0x0000000107fefc5b
> postgres`ExecScanFetch(node=0x00007f9743019c88,
> accessMtd=(postgres`BitmapHeapNext at nodeBitmapHeapscan.c:77),
> recheckMtd=(postgres`BitmapHeapRecheck at nodeBitmapHeapscan.c:710)) +
> 459 at execScan.c:95
>     frame #5: 0x0000000107fef983
> postgres`ExecScan(node=0x00007f9743019c88,
> accessMtd=(postgres`BitmapHeapNext at nodeBitmapHeapscan.c:77),
> recheckMtd=(postgres`BitmapHeapRecheck at nodeBitmapHeapscan.c:710)) +
> 147 at execScan.c:162
>     frame #6: 0x00000001080008d1
> postgres`ExecBitmapHeapScan(pstate=0x00007f9743019c88) + 49 at
> nodeBitmapHeapscan.c:735
>
> (lldb) f 3
> frame #3: 0x0000000108000d57
> postgres`BitmapHeapNext(node=0x00007f9743019c88) + 503 at
> nodeBitmapHeapscan.c:156
>    153 * dsa_pointer of the iterator state which will be used by
>    154 * multiple processes to iterate jointly.
>    155 */
> -> 156 pstate->tbmiterator = tbm_prepare_shared_iterate(tbm);
>    157 #ifdef USE_PREFETCH
>    158 if (node->prefetch_maximum > 0)
>    159
> (lldb) print tbm->dsa
> (dsa_area *) $3 = 0x0000000000000000
> (lldb) print node->ss.ps.state->es_query_dsa
> (dsa_area *) $5 = 0x0000000000000000
> (lldb) f 17
> frame #17: 0x000000010800363b
> postgres`ExecGather(pstate=0x00007f9743019320) + 635 at
> nodeGather.c:220
>    217 * Get next tuple, either from one of our workers, or by running the
> plan
>    218 * ourselves.
>    219 */
> -> 220 slot = gather_getnext(node);
>    221 if (TupIsNull(slot))
>    222 return NULL;
>    223
> (lldb) print *node->pei
> (ParallelExecutorInfo) $8 = {
>   planstate = 0x00007f9743019640
>   pcxt = 0x00007f97450001b8
>   buffer_usage = 0x0000000108b7e218
>   instrumentation = 0x0000000108b7da38
>   area = 0x0000000000000000
>   param_exec = 0
>   finished = '\0'
>   tqueue = 0x0000000000000000
>   reader = 0x0000000000000000
> }
> (lldb) print *node->pei->pcxt
> warning: could not load any Objective-C class information. This will
> significantly reduce the quality of type information available.
> (ParallelContext) $9 = {
>   node = {
>     prev = 0x000000010855fb60
>     next = 0x000000010855fb60
>   }
>   subid = 1
>   nworkers = 0
>   nworkers_launched = 0
>   library_name = 0x00007f9745000248 "postgres"
>   function_name = 0x00007f9745000268 "ParallelQueryMain"
>   error_context_stack = 0x0000000000000000
>   estimator = (space_for_chunks = 180352, number_of_keys = 19)
>   seg = 0x0000000000000000
>   private_memory = 0x0000000108b53038
>   toc = 0x0000000108b53038
>   worker = 0x0000000000000000
> }
>
> I think there are two failure modes: one of your sessions showed the
> "too many ..." error (that's good, ran out of slots and said so and
> our error machinery worked as it should), and another crashed with a
> segfault, because it tried to use a NULL "area" pointer (bad).  I
> think this is a degenerate case where we completely failed to launch
> parallel query, but we ran the parallel query plan anyway and this
> code thinks that the DSA is available.  Oops.
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>

Reply via email to