On Sun, Dec 02, 2018 at 09:35:06PM -0800, Noah Misch wrote:
> On Tue, Sep 25, 2018 at 08:05:12AM -0700, Noah Misch wrote:
> > On Mon, Sep 24, 2018 at 01:53:05PM -0400, Tom Lane wrote:
> > > Overall, I agree that neither of these approaches are exactly attractive.
> > > We're paying a heck of a lot of performance or complexity to solve a
> > > problem that shouldn't even be there, and that we don't understand well.
> > > In particular, the theory that some privileged code is injecting a thread
> > > into every new process doesn't square with my results at
> > > https://www.postgresql.org/message-id/15345.1525145612%40sss.pgh.pa.us
> > > 
> > > I think our best course of action at this point is to do nothing until
> > > we have a clearer understanding of what's actually happening on dory.
> > > Perhaps such understanding will yield an idea for a less painful fix.
> 
> Could one of you having a dory login use
> https://live.sysinternals.com/Procmon.exe to capture process events during
> backend startup?  The ideal would be one capture where startup failed reattach
> and another where it succeeded, but having the successful run alone would be a
> good start.

Joseph Ayers provided, off-list, the capture from a successful startup.  It
wasn't materially different from the one my system generates, so I abandoned
that line of inquiry.  Having explored other aspects of the problem, I expect
the attached fix will work.  I can reproduce the 4 MiB allocations described
in https://postgr.es/m/29823.1525132...@sss.pgh.pa.us; a few times per
"vcregress check", they emerge in the middle of PGSharedMemoryReAttach().  On
my system, there's 5.7 MiB of free address space just before UsedShmemSegAddr,
so the 4 MiB allocation fits in there, and PGSharedMemoryReAttach() does not
fail.  Still, it's easy to imagine that boring variations between environments
could surface dory's problem by reducing that free 5.7 MiB to, say, 3.9 MiB.

The 4 MiB allocations are stacks for new threads of the default thread
pool[1].  (I confirmed that by observing their size change when I changed
StackReserveSize in MSBuildProject.pm and by checking all stack pointers with
"thread apply all info frame" in gdb.)  The API calls in
PGSharedMemoryReAttach() don't cause the thread creation; it's a timing
coincidence.  Commit 2307868 would have worked around the problem, but
pg_usleep() is essentially a no-op on Windows before
pgwin32_signal_initialize() runs.  (I'll push Assert(pgwin32_signal_event) to
some functions.)  While one fix is to block until all expected threads have
started, that could be notably slow, and I don't know how to implement it
cleanly.  I think a better fix is to arrange for the system to prefer a
different address space region for these thread stacks; for details, see the
first comment the patch adds to win32_shmem.c.  This works here.

> backend startup sees six thread creations:
> 
> 1. main thread
> 2. thread created before postgres.exe has control
> 3. thread created before postgres.exe has control
> 4. thread created before postgres.exe has control
> 5. in pgwin32_signal_initialize()
> 6. in src\backend\port\win32\timer.c:setitimer()
> 
> Threads 2-4 exit exactly 30s after creation.  If we fail to reattach to shared
> memory, we'll exit before reaching code to start 5 or 6.

Threads 2-4 proved to be worker threads of the default thread pool.

[1] https://docs.microsoft.com/en-us/windows/desktop/ProcThread/thread-pools
diff --git a/src/backend/port/win32/signal.c b/src/backend/port/win32/signal.c
index dd91fe5..7d8961f 100644
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index bc1e946..eed17c5 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -17,6 +17,28 @@
 #include "storage/ipc.h"
 #include "storage/pg_shmem.h"
 
+/*
+ * Early in a process's life, Windows asynchronously creates threads for the
+ * process's "default thread pool"
+ * (https://docs.microsoft.com/en-us/windows/desktop/ProcThread/thread-pools).
+ * Occasionally, thread creation allocates a stack after
+ * PGSharedMemoryReAttach() has released UsedShmemSegAddr and before it has
+ * mapped shared memory at UsedShmemSegAddr.  This would cause mapping to fail
+ * if the allocator preferred the just-released region for allocating the new
+ * thread stack.  We observed such failures in some Windows Server 2016
+ * configurations.  To give the system another region to prefer, reserve and
+ * release an additional, protective region immediately before reserving or
+ * releasing shared memory.  The idea is that, if the allocator handed out
+ * REGION1 pages before REGION2 pages at one occasion, it will do so whenever
+ * both regions are free.  Windows Server 2016 exhibits that behavior, and a
+ * system behaving differently would have less need to protect
+ * UsedShmemSegAddr.  The protective region must be at least large enough for
+ * one thread stack.  However, ten times as much is less than 2% of the 32-bit
+ * address space and is negligible relative to the 64-bit address space.
+ */
+#define PROTECTIVE_REGION_SIZE (10 * WIN32_STACK_RLIMIT)
+void      *ShmemProtectiveRegion = NULL;
+
 HANDLE         UsedShmemSegID = INVALID_HANDLE_VALUE;
 void      *UsedShmemSegAddr = NULL;
 static Size UsedShmemSegSize = 0;
@@ -192,6 +214,12 @@ PGSharedMemoryCreate(Size size, bool makePrivate, int port,
        Size            orig_size = size;
        DWORD           flProtect = PAGE_READWRITE;
 
+       ShmemProtectiveRegion = VirtualAlloc(NULL, PROTECTIVE_REGION_SIZE,
+                                                                               
 MEM_RESERVE, PAGE_NOACCESS);
+       if (ShmemProtectiveRegion == NULL)
+               elog(FATAL, "could not reserve memory region: error code %lu",
+                        GetLastError());
+
        /* Room for a header? */
        Assert(size > MAXALIGN(sizeof(PGShmemHeader)));
 
@@ -370,9 +398,9 @@ retry:
  * an already existing shared memory segment, using the handle inherited from
  * the postmaster.
  *
- * UsedShmemSegID and UsedShmemSegAddr are implicit parameters to this
- * routine.  The caller must have already restored them to the postmaster's
- * values.
+ * ShmemProtectiveRegion, UsedShmemSegID and UsedShmemSegAddr are implicit
+ * parameters to this routine.  The caller must have already restored them to
+ * the postmaster's values.
  */
 void
 PGSharedMemoryReAttach(void)
@@ -380,12 +408,16 @@ PGSharedMemoryReAttach(void)
        PGShmemHeader *hdr;
        void       *origUsedShmemSegAddr = UsedShmemSegAddr;
 
+       Assert(ShmemProtectiveRegion != NULL);
        Assert(UsedShmemSegAddr != NULL);
        Assert(IsUnderPostmaster);
 
        /*
-        * Release memory region reservation that was made by the postmaster
+        * Release memory region reservations made by the postmaster
         */
+       if (VirtualFree(ShmemProtectiveRegion, 0, MEM_RELEASE) == 0)
+               elog(FATAL, "failed to release reserved memory region 
(addr=%p): error code %lu",
+                        ShmemProtectiveRegion, GetLastError());
        if (VirtualFree(UsedShmemSegAddr, 0, MEM_RELEASE) == 0)
                elog(FATAL, "failed to release reserved memory region 
(addr=%p): error code %lu",
                         UsedShmemSegAddr, GetLastError());
@@ -414,13 +446,14 @@ PGSharedMemoryReAttach(void)
  * The child process startup logic might or might not call PGSharedMemoryDetach
  * after this; make sure that it will be a no-op if called.
  *
- * UsedShmemSegID and UsedShmemSegAddr are implicit parameters to this
- * routine.  The caller must have already restored them to the postmaster's
- * values.
+ * ShmemProtectiveRegion, UsedShmemSegID and UsedShmemSegAddr are implicit
+ * parameters to this routine.  The caller must have already restored them to
+ * the postmaster's values.
  */
 void
 PGSharedMemoryNoReAttach(void)
 {
+       Assert(ShmemProtectiveRegion != NULL);
        Assert(UsedShmemSegAddr != NULL);
        Assert(IsUnderPostmaster);
 
@@ -447,12 +480,25 @@ PGSharedMemoryNoReAttach(void)
  * Rather, this is for subprocesses that have inherited an attachment and want
  * to get rid of it.
  *
- * UsedShmemSegID and UsedShmemSegAddr are implicit parameters to this
- * routine.
+ * ShmemProtectiveRegion, UsedShmemSegID and UsedShmemSegAddr are implicit
+ * parameters to this routine.
  */
 void
 PGSharedMemoryDetach(void)
 {
+       /*
+        * Releasing the protective region liberates an unimportant quantity of
+        * address space, but be tidy.
+        */
+       if (ShmemProtectiveRegion != NULL)
+       {
+               if (VirtualFree(ShmemProtectiveRegion, 0, MEM_RELEASE) == 0)
+                       elog(LOG, "failed to release reserved memory region 
(addr=%p): error code %lu",
+                                ShmemProtectiveRegion, GetLastError());
+
+               ShmemProtectiveRegion = NULL;
+       }
+
        /* Unmap the view, if it's mapped */
        if (UsedShmemSegAddr != NULL)
        {
@@ -510,19 +556,22 @@ pgwin32_ReserveSharedMemoryRegion(HANDLE hChild)
 {
        void       *address;
 
+       Assert(ShmemProtectiveRegion != NULL);
        Assert(UsedShmemSegAddr != NULL);
        Assert(UsedShmemSegSize != 0);
 
-       address = VirtualAllocEx(hChild, UsedShmemSegAddr, UsedShmemSegSize,
-                                                        MEM_RESERVE, 
PAGE_READWRITE);
+       /* ShmemProtectiveRegion */
+       address = VirtualAllocEx(hChild, ShmemProtectiveRegion,
+                                                        PROTECTIVE_REGION_SIZE,
+                                                        MEM_RESERVE, 
PAGE_NOACCESS);
        if (address == NULL)
        {
                /* Don't use FATAL since we're running in the postmaster */
                elog(LOG, "could not reserve shared memory region (addr=%p) for 
child %p: error code %lu",
-                        UsedShmemSegAddr, hChild, GetLastError());
+                        ShmemProtectiveRegion, hChild, GetLastError());
                return false;
        }
-       if (address != UsedShmemSegAddr)
+       if (address != ShmemProtectiveRegion)
        {
                /*
                 * Should never happen - in theory if allocation granularity 
causes
@@ -531,8 +580,23 @@ pgwin32_ReserveSharedMemoryRegion(HANDLE hChild)
                 * Don't use FATAL since we're running in the postmaster.
                 */
                elog(LOG, "reserved shared memory region got incorrect address 
%p, expected %p",
+                        address, ShmemProtectiveRegion);
+               return false;
+       }
+
+       /* UsedShmemSegAddr */
+       address = VirtualAllocEx(hChild, UsedShmemSegAddr, UsedShmemSegSize,
+                                                        MEM_RESERVE, 
PAGE_READWRITE);
+       if (address == NULL)
+       {
+               elog(LOG, "could not reserve shared memory region (addr=%p) for 
child %p: error code %lu",
+                        UsedShmemSegAddr, hChild, GetLastError());
+               return false;
+       }
+       if (address != UsedShmemSegAddr)
+       {
+               elog(LOG, "reserved shared memory region got incorrect address 
%p, expected %p",
                         address, UsedShmemSegAddr);
-               VirtualFreeEx(hChild, address, 0, MEM_RELEASE);
                return false;
        }
 
diff --git a/src/backend/postmaster/postmaster.c 
b/src/backend/postmaster/postmaster.c
index fe59963..6ab1dda 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -484,6 +484,7 @@ typedef struct
 #ifndef WIN32
        unsigned long UsedShmemSegID;
 #else
+       void       *ShmemProtectiveRegion;
        HANDLE          UsedShmemSegID;
 #endif
        void       *UsedShmemSegAddr;
@@ -6011,6 +6012,9 @@ save_backend_variables(BackendParameters *param, Port 
*port,
        param->MyCancelKey = MyCancelKey;
        param->MyPMChildSlot = MyPMChildSlot;
 
+#ifdef WIN32
+       param->ShmemProtectiveRegion = ShmemProtectiveRegion;
+#endif
        param->UsedShmemSegID = UsedShmemSegID;
        param->UsedShmemSegAddr = UsedShmemSegAddr;
 
@@ -6244,6 +6248,9 @@ restore_backend_variables(BackendParameters *param, Port 
*port)
        MyCancelKey = param->MyCancelKey;
        MyPMChildSlot = param->MyPMChildSlot;
 
+#ifdef WIN32
+       ShmemProtectiveRegion = param->ShmemProtectiveRegion;
+#endif
        UsedShmemSegID = param->UsedShmemSegID;
        UsedShmemSegAddr = param->UsedShmemSegAddr;
 
diff --git a/src/include/storage/pg_shmem.h b/src/include/storage/pg_shmem.h
index 57d3107..4e1e13d 100644
--- a/src/include/storage/pg_shmem.h
+++ b/src/include/storage/pg_shmem.h
@@ -65,6 +65,7 @@ typedef enum
 extern unsigned long UsedShmemSegID;
 #else
 extern HANDLE UsedShmemSegID;
+extern void *ShmemProtectiveRegion;
 #endif
 extern void *UsedShmemSegAddr;
 
diff --git a/src/port/open.c b/src/port/open.c
index 436cbae..f37afc7 100644

Reply via email to