On 06/03/16 10:24, Glyph wrote:

On Jun 3, 2016, at 01:06, Nagy, Attila <b...@fsn.hu <mailto:b...@fsn.hu>> wrote:

Hi,

I have a thread safe synchronous library, which I would like to use in a threadpool using deferToThread.

Without using (deferTo)threads I get consistent 1-3 ms response times, with deferring to threadpool, I get 30-300, varying wildly.

Why do you think this is bad performance?

With a direct call, you are doing almost nothing. Just pushing a stack frame.

With a deferToThread call, you are:
[...]

Sure, this is not the perfect example, I just wanted to measure the plain latency which this solution gives.
The whole picture is this:
I have an application which runs in uwsgi in multithreaded mode. It uses (the blocking)elasticsearch client. That app can serve queries with some tens of concurrent requests in around 3 ms.

For some reasons I would like to rewrite this app in Twisted. If I use the txes2 lib (which is nonblocking), I can achieve around the same performance (although it varies a lot more). This is async, no threads are involved.

My problem is that this library lacks several features, so I would like to use the blocking one, which needs to run in threads. When I do the requests in threads (with deferToThread, or just callInThread the whole handler) the response time is around 10-20 times more than uwsgi's threaded and blocking and Twisted's async and becomes highly unpredictable.

I haven't looked into the details of Twisted's threadpools, but what I would expect here is the same as using a simple python threadpool (like something uwsgi does, or just in the standard libraries), which according to the results work much faster and predictable than Twisted's.

BTW, I use queues in non-twisted programs and they are nowhere to cause several milliseconds(!) of latency.

OK, here's a more realistic example:
https://gist.github.com/bra-fsn/08734197601e5a63d6a2b56d7b048119

This does what is described above: calls an ES query in a Twisted threadpool and calls it directly in the thread the whole loop runs.

With one thread the overhead is somewhat acceptable:
deferToThread: avg 2051.00 us, sync: avg 1554.70 us, 1.32x increase
The direct call responds in 1.5 ms, while the deferToThread returns in 2ms.

Things get worse with the concurrency.
With 16 threads the response time is 18 times of the direct call (51 ms vs 2.8 ms!):
deferToThread: avg 51515.36 us, sync: avg 2798.19 us, 18.41x increase

With 32 threads:
deferToThread: avg 108222.73 us, sync: avg 2922.28 us, 37.03x increase

I use normal (stdlib) threadpools and I haven't seen this kind of performance degradation.

100 ms is a lot of time...
_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Reply via email to