New submission from Bruce Merry <bme...@gmail.com>:

While investigating poor HTTP read performance I discovered that reading all 
the data from a response with a content-length goes via _safe_read, which in 
turn reads in chunks of at most MAXAMOUNT (1MB) before stitching them together 
with b"".join. This can really hurt performance for responses larger than 
MAXAMOUNT, because
(a) the data has to be copied an additional time; and
(b) the join operation doesn't drop the GIL, so this limits multi-threaded 
scaling.

I'm struggling to see any advantage in doing this chunking - it's not saving 
memory either (in fact it is wasting it).

To give an idea of the performance impact, changing MAXAMOUNT to a very large 
value made a multithreaded test of mine go from 800MB/s to 2.5GB/s (which is 
limited by the network speed).

----------
components: Library (Lib)
messages: 336081
nosy: bmerry
priority: normal
severity: normal
status: open
title: Why does http.client.HTTPResponse._safe_read use MAXAMOUNT
versions: Python 3.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue36050>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to