On Wed, Jan 29, 2020 at 10:37 PM Nicola Contu <[email protected]> wrote:
> This is the error on postgres log of the segmentation fault :
>
> 2020-01-21 14:20:29 GMT [] [42222]: [108-1] db=,user= LOG: server process
> (PID 2042) was terminated by signal 11: Segmentation fault
> 2020-01-21 14:20:29 GMT [] [42222]: [109-1] db=,user= DETAIL: Failed process
> was running: select pid from pg_stat_activity where query ilike 'REFRESH
> MATERIALIZED VIEW CONCURRENTLY matview_vrs_request_stats'
> 2020-01-21 14:20:29 GMT [] [42222]: [110-1] db=,user= LOG: terminating any
> other active server processes
Ok, this is a bug. Do you happen to have a core file? I don't recall
where CentOS puts them.
> > If you're on Linux, you can probably see them with "ls /dev/shm".
>
> I see a lot of files there, and doing a cat they are empty. What can I do
> with them?
Not much, but it tells you approximately how many 'slots' are in use
at a given time (ie because of currently running parallel queries), if
they were created since PostgreSQL started up (if they're older ones
they could have leaked from a crashed server, but we try to avoid that
by trying to clean them up when you restart).
> Those are two different problems I guess, but they are related because right
> before the Segmentation Fault I see a lot of shared segment errors in the
> postgres log.
That gave me an idea... I hacked my copy of PostgreSQL to flip a coin
to decide whether to pretend there are no slots free (see below), and
I managed to make it crash in the regression tests when doing a
parallel index build. It's late here now, but I'll look into that
tomorrow. It's possible that the parallel index code needs to learn
to cope with that.
#2 0x0000000000a096f6 in SharedFileSetInit (fileset=0x80b2fe14c,
seg=0x0) at sharedfileset.c:71
#3 0x0000000000c72440 in tuplesort_initialize_shared
(shared=0x80b2fe140, nWorkers=2, seg=0x0) at tuplesort.c:4341
#4 0x00000000005ab405 in _bt_begin_parallel
(buildstate=0x7fffffffc070, isconcurrent=false, request=1) at
nbtsort.c:1402
#5 0x00000000005aa7c7 in _bt_spools_heapscan (heap=0x801ddd7e8,
index=0x801dddc18, buildstate=0x7fffffffc070, indexInfo=0x80b2b62d0)
at nbtsort.c:396
#6 0x00000000005aa695 in btbuild (heap=0x801ddd7e8,
index=0x801dddc18, indexInfo=0x80b2b62d0) at nbtsort.c:328
#7 0x0000000000645b5c in index_build (heapRelation=0x801ddd7e8,
indexRelation=0x801dddc18, indexInfo=0x80b2b62d0, isreindex=false,
parallel=true) at index.c:2879
#8 0x0000000000643e5c in index_create (heapRelation=0x801ddd7e8,
indexRelationName=0x7fffffffc510 "pg_toast_24587_index",
indexRelationId=24603, parentIndexRelid=0,
I don't know if that's the bug that you're hitting, but it definitely
could be: REFRESH MATERIALIZED VIEW could be rebuilding an index.
===
diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index 90e0d739f8..f0b49d94ee 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -468,6 +468,13 @@ dsm_create(Size size, int flags)
nitems = dsm_control->nitems;
for (i = 0; i < nitems; ++i)
{
+ /* BEGIN HACK */
+ if (random() % 10 > 5)
+ {
+ nitems = dsm_control->maxitems;
+ break;
+ }
+ /* END HACK */
if (dsm_control->item[i].refcnt == 0)
{
dsm_control->item[i].handle = seg->handle;