New submission from Inada Naoki <songofaca...@gmail.com>:
The original issue is reported here. https://discuss.python.org/t/non-optimal-bz2-reading-speed/6869 1. Only BZ2File uses RLock() lzma and gzip don't use RLock(). It adds significant performance overhead. When I removed `with self._lock:`, decompression speed improved from about 148k line/sec to 200k line/sec. 2. The default __iter__ calls `readline()` for each iteration. BZ2File.readline() is implemented in C so it is slightly slow than C implementation. If I add this `__iter__()` to BZ2File, decompression speed improved from about 148k lines/sec (or 200k lines/sec) to 500k lines/sec. def __iter__(self): self._check_can_read() return iter(self._buffer) If this __iter__ method is safe, it can be added to gzip and lzma too. ---------- components: Library (Lib) files: dec.py messages: 390588 nosy: methane priority: normal severity: normal status: open title: bz2 performance issue. versions: Python 3.10 Added file: https://bugs.python.org/file49948/dec.py _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue43785> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com