On 06/03/16 10:24, Glyph wrote:
On Jun 3, 2016, at 01:06, Nagy, Attila <b...@fsn.hu
<mailto:b...@fsn.hu>> wrote:
Hi,
I have a thread safe synchronous library, which I would like to use
in a threadpool using deferToThread.
Without using (deferTo)threads I get consistent 1-3 ms response
times, with deferring to threadpool, I get 30-300, varying wildly.
Why do you think this is bad performance?
With a direct call, you are doing almost nothing. Just pushing a
stack frame.
With a deferToThread call, you are:
[...]
Sure, this is not the perfect example, I just wanted to measure the
plain latency which this solution gives.
The whole picture is this:
I have an application which runs in uwsgi in multithreaded mode. It uses
(the blocking)elasticsearch client.
That app can serve queries with some tens of concurrent requests in
around 3 ms.
For some reasons I would like to rewrite this app in Twisted. If I use
the txes2 lib (which is nonblocking), I can achieve around the same
performance (although it varies a lot more). This is async, no threads
are involved.
My problem is that this library lacks several features, so I would like
to use the blocking one, which needs to run in threads.
When I do the requests in threads (with deferToThread, or just
callInThread the whole handler) the response time is around 10-20 times
more than uwsgi's threaded and blocking and Twisted's async and becomes
highly unpredictable.
I haven't looked into the details of Twisted's threadpools, but what I
would expect here is the same as using a simple python threadpool (like
something uwsgi does, or just in the standard libraries), which
according to the results work much faster and predictable than Twisted's.
BTW, I use queues in non-twisted programs and they are nowhere to cause
several milliseconds(!) of latency.
OK, here's a more realistic example:
https://gist.github.com/bra-fsn/08734197601e5a63d6a2b56d7b048119
This does what is described above: calls an ES query in a Twisted
threadpool and calls it directly in the thread the whole loop runs.
With one thread the overhead is somewhat acceptable:
deferToThread: avg 2051.00 us, sync: avg 1554.70 us, 1.32x increase
The direct call responds in 1.5 ms, while the deferToThread returns in 2ms.
Things get worse with the concurrency.
With 16 threads the response time is 18 times of the direct call (51 ms
vs 2.8 ms!):
deferToThread: avg 51515.36 us, sync: avg 2798.19 us, 18.41x increase
With 32 threads:
deferToThread: avg 108222.73 us, sync: avg 2922.28 us, 37.03x increase
I use normal (stdlib) threadpools and I haven't seen this kind of
performance degradation.
100 ms is a lot of time...
_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python