Thomas Munro <thomas.mu...@enterprisedb.com> writes: > On Thu, Oct 18, 2018 at 1:10 PM Tom Lane <t...@sss.pgh.pa.us> wrote: >> ... However, I'm still slightly interested in how it >> was that that broke DSM so thoroughly ...
> Me too. Frustratingly, that vm object might still exist on Larry's > machine if it hasn't been rebooted (since we failed to shm_unlink() > it), so if we knew its name we could write a program to shm_open(), > mmap(), dump out to a file for analysis and then we could work out > which of the sanity tests it failed and maybe get some clues. Larry's REL_10_STABLE failure logs are interesting: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&dt=2018-10-17%2020%3A42%3A17 2018-10-17 15:48:08.849 CDT [55240:7] LOG: dynamic shared memory control segment is corrupt 2018-10-17 15:48:08.849 CDT [55240:8] LOG: sem_destroy failed: Invalid argument 2018-10-17 15:48:08.850 CDT [55240:9] LOG: sem_destroy failed: Invalid argument 2018-10-17 15:48:08.850 CDT [55240:10] LOG: sem_destroy failed: Invalid argument 2018-10-17 15:48:08.850 CDT [55240:11] LOG: sem_destroy failed: Invalid argument ... lots more ... 2018-10-17 15:48:08.862 CDT [55240:122] LOG: sem_destroy failed: Invalid argument 2018-10-17 15:48:08.862 CDT [55240:123] LOG: sem_destroy failed: Invalid argument TRAP: FailedAssertion("!(dsm_control_mapped_size == 0)", File: "dsm.c", Line: 182) So at least in this case, not only did we lose the DSM segment but also all of our semaphores. Is it conceivable that Python somehow destroyed those objects, rather than stomping on the contents of the DSM segment? If not, how do we explain this log? Also, why is there branch-specific variation? The fact that v11 and HEAD aren't whinging about lost semaphores is not hard to understand --- we stopped using SysV semas. But why don't the older branches look like v10 here? regards, tom lane