[HACKERS] modifying WaitEventSets (was: Performance degradation in commit ac1d794)

Robert Haas Tue, 03 May 2016 20:10:03 -0700

On Sun, Mar 20, 2016 at 8:31 AM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> One thing that I want to do but can't with this interface is remove an
> fd from the set.  I can AddWaitEventToSet returning a position, and I
> can ModifyWait to provide new event mask by position including zero
> mask, I can't actually remove the fd (for example to avoid future
> error events that can't be masked, or to allow that fd to be closed
> and perhaps allow that fd number to coincidentally be readded later,
> and just generally to free the slot).  There is an underlying way to
> remove an fd from a set with poll (sort of), epoll, kqueue.  (Not sure
> about Windows.  But surely...).  I wonder if there should be
> RemoveWaitEventFromSet(set, position) which recycles event slots,
> sticking them on a freelist (and setting corresponding pollfd structs'
> fd to -1).


In practice, you're not likely to want to completely forget about a
file descriptor and wait for some totally new file descriptor, or at
least not very often.  You're much more likely to have a set of fds
that you care about, but the events that you care about for each one
vary over time, and it may often by the empty set for some of those
fds.  For example, in the case of Append -> {Foreign Scan x many}, you
will have an fd for each foreign scan.  When you dispatch a query to a
particular server, you're now interested in whether that fd is
ready-ready (or, if SSL is involved, maybe sometimes you care about
write-ready instead) until you get back all the query results from
that server.  At that point, you no longer care about events on that
fd until you dispatch the next query - and then you again do.

So what's the best API for that?  One idea is to change
ModifyWaitEvent to accept events = 0 instead of failing an assertion
inside WaitEventAdjustEpoll.  We don't want to wait for EPOLLERR |
EPOLLHUP in that case since we'd have to wait event to return if one
of those things occurred.  It would be easy enough to rejigger that
code so that we pass epoll_ev.events as 0 in that case, but I think it
doesn't help: epoll_ctl(2) says epoll_wait(2) always waits for those
events.  Instead I think we'd have to use EPOLL_CTL_DEL in that case,
which is a problem: when the user next calls ModifyWaitEvent, we would
need to use EPOLL_CTL_ADD rather than EPOLL_CTL_MOD, but how would we
know that?  We'd have to add state for that somewhere.

Alternatively, we could implement RemoveWaitEventFromSet(set,
position), as you propose.  That, too, needs some kind of additional
state, because we've got to track which positions are unused.  I'd be
inclined to guess that a bitmap would be better than a linked list;
for the storage space of one pointer, we could handle all
WaitEventSets with nevents <= 64, which is hard to beat.

Yet another idea is to have a new event WL_SOCKET_ERROR which waits
for only EPOLLERR | EPOLLHUP.  So, if you don't care about the socket
being readable or writeable at the moment, but you still vaguely care
about the file descriptor, you can do ModifyWaitEvent(set, pos,
WL_SOCKET_ERROR, NULL).  That's actually kind of nice, because now you
can still detect error/eof conditions on sockets that you are not
otherwise waiting for, and report the errors promptly.  At the moment,
I'm inclined to think this might be the best option...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] modifying WaitEventSets (was: Performance degradation in commit ac1d794)

Reply via email to