asyncio POLLHUP question

Chris Laws Thu, 26 Feb 2015 03:37:07 -0800

I have a system scenario where thousands of applications are running and
via a service discovery mechanism they all get notified that a service they
are all interesting in has come online. They all attempt to connect a TCP
socket to the service. This happen virtually instantly.


The problem that I see is that many of the applications that try to connect
to the server get themselves into a state where they are consuming a lot of
CPU.

I am using Python 3.4.2, asyncio and have set the server backlog set to
4000 in an effort to accomodate the connection request backlog. I am
actually using an event loop from aiozmq (but no ZMQ sockets in this
scenaio) but under the covers this is just using epoll so it should really
be the same as using the DefaultSelector.

Using strace on the apps exhibiting issues I see that a socket is
continuously triggering a POLLERR|POLLHUP event. This is the cause of the
large CPU usage. The socket is the one that was attempting to connect to
the new service that was just brought up.

I am guessing that the POLLHUP is caused by the server having issues
processing the volume of connect requests.

I think I need to drop/close the socket causing the POLLHUP. However, from
looking through the asyncio source code I don't see how I can do that from
within the _selector.select() or _process_events() functions with only the
knowledge of which fd is causing the issue.

How do poll errors propagate up from the select loop?

I can potentially unregister the fd but I don't think this will trigger the
transport/protocol getting closed (as far as I can tell) which prevents my
normal error handling scenarios from attempting to reconnect to the
service. The asyncio select functions seem to ignore events other than
EVENT_READ and EVENT_WRITE.

Any help would be appreciated.

Regards,
Chris

-- 
https://mail.python.org/mailman/listinfo/python-list

asyncio POLLHUP question

Reply via email to