On Mon, Apr 30, 2018 at 08:01:40PM -0400, Tom Lane wrote: > It's clear from dory's results that something is causing a 4MB chunk > of memory to get reserved in the process's address space, sometimes. > It might happen during the main MapViewOfFileEx call, or during the > preceding VirtualFree, or with my map/unmap dance in place, it might > happen during that. Frequently it doesn't happen at all, at least not > before the point where we've successfully done MapViewOfFileEx. But > if it does happen, and the chunk happens to get put in a spot that > overlaps where we want to put the shmem block, kaboom. > > What seems like a plausible theory at this point is that the apparent > asynchronicity is due to the allocation being triggered by a different > thread, and the fact that our added monitoring code seems to make the > failure more likely can be explained by that code changing the timing. > But what thread could it be? It doesn't really look to me like either > the signal thread or the timer thread could eat 4MB. syslogger.c > also spawns a thread, on Windows, but AFAICS that's not being used in > this test configuration. Maybe the reason dory is showing the problem > is something or other is spawning a thread we don't even know about?
Likely some privileged daemon is creating a thread in every new process. (On Windows, it's not unusual for one process to create a thread in another process.) We don't have good control over that. > I'm at a loss for a reasonable way to fix it > for real. Is there a way to seize control of a Windows process so that > there are no other running threads? I think not. > Any other ideas? PostgreSQL could retry the whole process creation, analogous to internal_forkexec() retries. Have the failed process exit after recording the fact that it couldn't attach. Make the postmaster notice and spawn a replacement. Give up after 100 failed attempts.