Re: WL_SOCKET_ACCEPT fairness on Windows

Andres Freund Fri, 31 Mar 2023 18:35:48 -0700

Hi,

On 2023-04-01 13:42:21 +1300, Thomas Munro wrote:
> Commit 7389aad6 started using WaitEventSetWait() to wait for incoming
> connections.  Before that, we used select(), for which we have our own
> implementation for Windows.
> 
> While hacking on patches to rip a bunch of unused Win32 socket wrapper
> code out, I twigged that the old pgwin32_select() code was careful to
> report multiple sockets at once by brute force polling of all of them,
> while WaitEventSetWait() doesn't do that: it reports just one event,
> because that's what the Windows WaitForMultipleEvents() syscall does.
> I guess this means you can probably fill up the listen queue of server
> socket 1 to prevent server socket 2 from ever being serviced, whereas
> on Unix we'll accept one connection at a time from each in round-robin
> fashion.


It does indeed sound like it'd behave that way:

> If bWaitAll is FALSE, the return value minus WAIT_OBJECT_0 indicates the
> lpHandles array index of the object that satisfied the wait. If more than
> one object became signaled during the call, this is the array index of the
> signaled object with the smallest index value of all the signaled objects.

I wonder if we ought to bite the bullet and replace the use of
WaitForMultipleObjects() with RegisterWaitForSingleObject() and then use
GetQueuedCompletionStatus() to wait. The fairness issue here is a motivation,
but the bigger one is that that'd get us out from under the
MAXIMUM_WAIT_OBJECTS (IIRC 64) limit. Afaict that'd also allow us to read
multiple notifications at once, using GetQueuedCompletionStatusEx().

Medium term that'd also be a small step towards using readiness based APIs in
a few places...


> I think we could get the same effect as pgwin32_select() more cheaply
> by doing the initial WaitForMultipleEvents() call with the caller's
> timeout exactly as we do today, and then, while we have space,
> repeatedly calling
> WaitForMultipleEvents(handles=&events[last_signaled_event_index + 1],
> timeout=0) until it reports timeout.

That would make sense, and should indeed be reasonable cost-wise.


> I mention this now because I'm not sure whether to consider this an
> 'open item' for 16, or merely an enhancement for 17.  I guess the
> former, because someone might call that a new denial of service
> vector.  On the other hand, if you fill up the listen queue for socket
> 1 with enough vigour, you're also denying service to socket 1, so I
> don't know if it's worth worrying about.  Opinions on that?

I'm not sure either. It doesn't strike me as a particularly relevant
bottleneck. And the old approach of doing more work for every single
connection also made many connections worse, I think?

> diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
> index f4123e7de7..cc7b572008 100644
> --- a/src/backend/storage/ipc/latch.c
> +++ b/src/backend/storage/ipc/latch.c
> @@ -2025,6 +2025,8 @@ WaitEventSetWaitBlock(WaitEventSet *set, int 
> cur_timeout,
>        */
>       cur_event = (WaitEvent *) &set->events[rc - WAIT_OBJECT_0 - 1];
>  
> +loop:
> +

As far as I can tell, we'll never see WL_LATCH_SET or WL_POSTMASTER_DEATH. I
think it'd look cleaner to move the body of if (cur_event->events & 
WL_SOCKET_MASK)
into a separate function that we then also can call further down.


>       occurred_events->pos = cur_event->pos;
>       occurred_events->user_data = cur_event->user_data;
>       occurred_events->events = 0;
> @@ -2044,6 +2046,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int 
> cur_timeout,
>                       occurred_events->events = WL_LATCH_SET;
>                       occurred_events++;
>                       returned_events++;
> +                     nevents--;
>               }
>       }
>       else if (cur_event->events == WL_POSTMASTER_DEATH)
> @@ -2063,6 +2066,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int 
> cur_timeout,
>                       occurred_events->events = WL_POSTMASTER_DEATH;
>                       occurred_events++;
>                       returned_events++;
> +                     nevents--;
>               }
>       }
>       else if (cur_event->events & WL_SOCKET_MASK)
> @@ -2124,6 +2128,36 @@ WaitEventSetWaitBlock(WaitEventSet *set, int 
> cur_timeout,
>               {
>                       occurred_events++;
>                       returned_events++;
> +                     nevents--;
> +             }
> +     }

Seems like we could use returned_events to get nevents in the way you want it,
without adding even more ++/-- to each of the different events?

Greetings,

Andres Freund

Re: WL_SOCKET_ACCEPT fairness on Windows

Reply via email to