New submission from Bruce Merry <bme...@gmail.com>:
While investigating poor HTTP read performance I discovered that reading all the data from a response with a content-length goes via _safe_read, which in turn reads in chunks of at most MAXAMOUNT (1MB) before stitching them together with b"".join. This can really hurt performance for responses larger than MAXAMOUNT, because (a) the data has to be copied an additional time; and (b) the join operation doesn't drop the GIL, so this limits multi-threaded scaling. I'm struggling to see any advantage in doing this chunking - it's not saving memory either (in fact it is wasting it). To give an idea of the performance impact, changing MAXAMOUNT to a very large value made a multithreaded test of mine go from 800MB/s to 2.5GB/s (which is limited by the network speed). ---------- components: Library (Lib) messages: 336081 nosy: bmerry priority: normal severity: normal status: open title: Why does http.client.HTTPResponse._safe_read use MAXAMOUNT versions: Python 3.7 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue36050> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com