Re: DSM robustness failure (was Re: Peripatus/failures)

Tom Lane Wed, 17 Oct 2018 18:38:30 -0700

Thomas Munro <thomas.mu...@enterprisedb.com> writes:
> On Thu, Oct 18, 2018 at 1:10 PM Tom Lane <t...@sss.pgh.pa.us> wrote:
>> ... However, I'm still slightly interested in how it
>> was that that broke DSM so thoroughly ...


> Me too.  Frustratingly, that vm object might still exist on Larry's
> machine if it hasn't been rebooted (since we failed to shm_unlink()
> it), so if we knew its name we could write a program to shm_open(),
> mmap(), dump out to a file for analysis and then we could work out
> which of the sanity tests it failed and maybe get some clues.

Larry's REL_10_STABLE failure logs are interesting:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&dt=2018-10-17%2020%3A42%3A17

2018-10-17 15:48:08.849 CDT [55240:7] LOG:  dynamic shared memory control 
segment is corrupt
2018-10-17 15:48:08.849 CDT [55240:8] LOG:  sem_destroy failed: Invalid argument
2018-10-17 15:48:08.850 CDT [55240:9] LOG:  sem_destroy failed: Invalid argument
2018-10-17 15:48:08.850 CDT [55240:10] LOG:  sem_destroy failed: Invalid 
argument
2018-10-17 15:48:08.850 CDT [55240:11] LOG:  sem_destroy failed: Invalid 
argument
... lots more ...
2018-10-17 15:48:08.862 CDT [55240:122] LOG:  sem_destroy failed: Invalid 
argument
2018-10-17 15:48:08.862 CDT [55240:123] LOG:  sem_destroy failed: Invalid 
argument
TRAP: FailedAssertion("!(dsm_control_mapped_size == 0)", File: "dsm.c", Line: 
182)

So at least in this case, not only did we lose the DSM segment but also
all of our semaphores.  Is it conceivable that Python somehow destroyed
those objects, rather than stomping on the contents of the DSM segment?
If not, how do we explain this log?

Also, why is there branch-specific variation?  The fact that v11 and HEAD
aren't whinging about lost semaphores is not hard to understand --- we
stopped using SysV semas.  But why don't the older branches look like v10
here?

                        regards, tom lane

Re: DSM robustness failure (was Re: Peripatus/failures)

Reply via email to