[issue23224] bz2/lzma: Compressor/Decompressor objects are only initialized in __init__
Ma Lin added the comment: These can be done in .__new__() method: - create thread lock - create (de)?compression context - initialize (de)?compressor states In .__init__() method, only set (de)?compression parameters. And prevent .__init__() method from being called multiple times. This mode works fine in my pyzstd module (A Python bindings to zstd library). But I think very few people will encounter this problem, we can leave it. -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue23224> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46255] Remove unnecessary check in _IOBase._check*() methods
New submission from Ma Lin : These methods are METH_NOARGS, in all cases the second parameter will be NULL. {"_checkClosed", _PyIOBase_check_closed, METH_NOARGS}, {"_checkSeekable", _PyIOBase_check_seekable, METH_NOARGS}, {"_checkReadable", _PyIOBase_check_readable, METH_NOARGS}, {"_checkWritable", _PyIOBase_check_writable, METH_NOARGS}, -- components: IO messages: 409672 nosy: malin priority: normal severity: normal status: open title: Remove unnecessary check in _IOBase._check*() methods versions: Python 3.11 ___ Python tracker <https://bugs.python.org/issue46255> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46255] Remove unnecessary check in _IOBase._check*() methods
Change by Ma Lin : -- keywords: +patch pull_requests: +28606 stage: -> patch review pull_request: https://github.com/python/cpython/pull/30397 ___ Python tracker <https://bugs.python.org/issue46255> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46255] Remove unnecessary check in _IOBase._check*() methods
Change by Ma Lin : -- resolution: -> not a bug stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue46255> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47040] Remove an invalid versionchanged in doc
New submission from Ma Lin : Since CPython 3.0.0, the checksums are always truncated to `unsigned int`: https://github.com/python/cpython/blob/v3.0/Modules/zlibmodule.c#L930 https://github.com/python/cpython/blob/v3.0/Modules/zlibmodule.c#L950 -- assignee: docs@python components: Documentation, Library (Lib) messages: 415386 nosy: docs@python, gregory.p.smith, malin priority: normal severity: normal status: open title: Remove an invalid versionchanged in doc versions: Python 3.10, Python 3.11, Python 3.9 ___ Python tracker <https://bugs.python.org/issue47040> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47040] Remove an invalid versionchanged in doc
Change by Ma Lin : -- keywords: +patch pull_requests: +30046 stage: -> patch review pull_request: https://github.com/python/cpython/pull/31955 ___ Python tracker <https://bugs.python.org/issue47040> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47040] Remove invalid versionchanged in doc
Ma Lin added the comment: `binascii.crc32` doc also has this invalid document: doc: https://docs.python.org/3/library/binascii.html#binascii.crc32 3.0.0 code: https://github.com/python/cpython/blob/v3.0/Modules/binascii.c#L1035 In addition, `binascii.crc32` has an `USE_ZLIB_CRC32` code path, but it's buggy. The length of zlib `crc32()` function is `unsigned int`, so if use `USE_ZLIB_CRC32` code path and the data > 4GiB, the result is wrong. Should we remove `USE_ZLIB_CRC32` code path in `binascii.c`, or fix it? `USE_ZLIB_CRC32` code path in binascii.c (bug code): https://github.com/python/cpython/blob/v3.11.0a6/Modules/binascii.c#L756-L767 crc32 in zlibmodule.c, it uses an UINT_MAX sliding window (right code): https://github.com/python/cpython/blob/v3.11.0a6/Modules/zlibmodule.c#L1436 -- title: Remove an invalid versionchanged in doc -> Remove invalid versionchanged in doc ___ Python tracker <https://bugs.python.org/issue47040> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44439] stdlib wrongly uses len() for bytes-like object
Ma Lin added the comment: `_Stream.write` method in tarfile.py also has this code: https://github.com/python/cpython/blob/v3.11.0a6/Lib/tarfile.py#L434 But this bug will not be triggered. When calling this method, always pass bytes data. `_ConnectionBase.send_bytes` method in multiprocessing\connection.py can be micro-optimized: https://github.com/python/cpython/blob/v3.11.0a6/Lib/multiprocessing/connection.py#L193 This can be done in another issue. So I think this issue can be closed. -- stage: patch review -> resolved status: pending -> closed ___ Python tracker <https://bugs.python.org/issue44439> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47040] Fix confusing versionchanged note in crc32 and adler32
Change by Ma Lin : -- pull_requests: +30090 pull_request: https://github.com/python/cpython/pull/32002 ___ Python tracker <https://bugs.python.org/issue47040> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47040] Fix confusing versionchanged note in crc32 and adler32
Ma Lin added the comment: PR 32002 is for 3.10/3.9 branches. -- ___ Python tracker <https://bugs.python.org/issue47040> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46864] Deprecate ob_shash in BytesObject
Ma Lin added the comment: If run this code, would it be slower? bytes_hash = hash(bytes_data) bytes_hash = hash(bytes_data) # get hash twice -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue46864> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46864] Deprecate ob_shash in BytesObject
Ma Lin added the comment: Since hash() is a public function, maybe some users use hash value to manage bytes objects in their own way, then there may be a performance regression. For a rough example, dispatch data to 16 servers. h = hash(b) sendto(server_number=h & 0xF, data=b) -- ___ Python tracker <https://bugs.python.org/issue46864> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46864] Deprecate ob_shash in BytesObject
Ma Lin added the comment: RAM is now relatively cheaper than CPU. 1 million bytes object additionally use 7.629 MiB RAM for ob_shash. (100_*8/1024/1024). This causes hash() performance regression anyway. -- ___ Python tracker <https://bugs.python.org/issue46864> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46864] Deprecate ob_shash in BytesObject
Ma Lin added the comment: If put a bytes object into multiple dicts/sets, the hash need to be computed multiple times. This seems a common usage. bytes is a very basic type, users may use it in various ways. And unskilled users may checking the same bytes object against dicts/sets many times. FYI, 1 GiB data: function seconds hash() 0.40 binascii.crc32() 1.66 (Gregory P. Smith is trying to improve this) zlib.crc32() 0.65 -- ___ Python tracker <https://bugs.python.org/issue46864> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46864] Deprecate ob_shash in BytesObject
Ma Lin added the comment: > I posted remove-bytes-hash.patch in this issue. Would you measure how this > affects whole application performance rather than micro benchmarks? I guess not much difference in benchmarks. But if put a bytes object into multiple dicts/sets, and len(bytes_key) is large, it will take a long time. (1 GiB 0.40 seconds on i5-11500 DDR4-3200) The length of bytes can be arbitrary,so computing time may be very different. Is it possible to let code objects use other types? In addition to ob_hash, maybe the extra byte \x00 at the end can be saved. -- ___ Python tracker <https://bugs.python.org/issue46864> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35859] Capture behavior depends on the order of an alternation
Ma Lin added the comment: Thanks for your review. 3.11 has a more powerful re module, also thank you for rebasing the atomic grouping code. -- ___ Python tracker <https://bugs.python.org/issue35859> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23689] Memory leak in Modules/sre_lib.h
Ma Lin added the comment: My PR methods are suboptimal, so I closed them. The number of REPEAT can be counted when compiling a pattern, and allocate a `SRE_REPEAT` array in `SRE_STATE` (with that number items). It seem at any time, a REPEAT will only have one in active, so a `SRE_REPEAT` array is fine. regex module does like this: https://github.com/mrabarnett/mrab-regex/blob/hg/regex_3/_regex.c#L18287-L18288 Can the number of REPEAT be placed in `SRE_OP_INFO`? And add a field to `SRE_OP_REPEAT` to indicate the index of this REPEAT. -- ___ Python tracker <https://bugs.python.org/issue23689> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47152] Reorganize the re module sources
Ma Lin added the comment: Please don't merge too close to the 3.11 beta1 release date, I'll submit PRs after this merged. -- ___ Python tracker <https://bugs.python.org/issue47152> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23689] Memory leak in Modules/sre_lib.h
Change by Ma Lin : -- pull_requests: +30265 pull_request: https://github.com/python/cpython/pull/32188 ___ Python tracker <https://bugs.python.org/issue23689> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47152] Reorganize the re module sources
Change by Ma Lin : -- pull_requests: +30266 pull_request: https://github.com/python/cpython/pull/32188 ___ Python tracker <https://bugs.python.org/issue47152> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23689] Memory leak in Modules/sre_lib.h
Change by Ma Lin : -- pull_requests: +30298 pull_request: https://github.com/python/cpython/pull/32223 ___ Python tracker <https://bugs.python.org/issue23689> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47199] multiprocessing: micro-optimize Connection.send_bytes() method
New submission from Ma Lin : `bytes(m)` can be replaced by memoryview.cast('B'), then no need for data copying. m = memoryview(buf) # HACK for byte-indexing of non-bytewise buffers (e.g. array.array) if m.itemsize > 1: m = memoryview(bytes(m)) n = len(m) https://github.com/python/cpython/blob/v3.11.0a6/Lib/multiprocessing/connection.py#L190-L194 -- components: Library (Lib) messages: 416538 nosy: malin priority: normal severity: normal status: open title: multiprocessing: micro-optimize Connection.send_bytes() method type: resource usage versions: Python 3.11 ___ Python tracker <https://bugs.python.org/issue47199> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47199] multiprocessing: micro-optimize Connection.send_bytes() method
Change by Ma Lin : -- keywords: +patch pull_requests: +30318 stage: -> patch review pull_request: https://github.com/python/cpython/pull/32247 ___ Python tracker <https://bugs.python.org/issue47199> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47152] Reorganize the re module sources
Ma Lin added the comment: In `Modules` folder, there are _sre.c/sre.h/sre_constants.h/sre_lib.h files. Will them be put into a folder? -- ___ Python tracker <https://bugs.python.org/issue47152> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23689] Memory leak in Modules/sre_lib.h
Change by Ma Lin : -- pull_requests: +30344 pull_request: https://github.com/python/cpython/pull/32283 ___ Python tracker <https://bugs.python.org/issue23689> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47152] Reorganize the re module sources
Ma Lin added the comment: Match.regs is an undocumented attribute, it seems it has existed since 1991. Can it be removed? https://github.com/python/cpython/blob/ff2cf1d7d5fb25224f3ff2e0c678d36f78e1f3cb/Modules/_sre/sre.c#L2871 -- ___ Python tracker <https://bugs.python.org/issue47152> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47152] Reorganize the re module sources
Ma Lin added the comment: > cryptic name In very early versions, "mark" was called register/region. https://github.com/python/cpython/blob/v1.0.1/Modules/regexpr.h#L48-L52 If span is accessed repeatedly, it's faster than Match.span(). Maybe consider renaming it, and make it as public attribute. -- ___ Python tracker <https://bugs.python.org/issue47152> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47211] Remove re.template() and re.TEMPLATE
Change by Ma Lin : -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue47211> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47248] Possible slowdown of regex searching in 3.11
Ma Lin added the comment: Could you give the two versions? I will do a git bisect. I tested 356997c~1 and 356997c [1], msvc2022 non-pgo release build: # regex_dna ### an +- std dev: 151 ms +- 1 ms -> 152 ms +- 1 ms: 1.01x slower t significant # regex_effbot ### an +- std dev: 2.47 ms +- 0.01 ms -> 2.46 ms +- 0.02 ms: 1.00x faster t significant # regex_v8 ### an +- std dev: 21.7 ms +- 0.1 ms -> 22.4 ms +- 0.1 ms: 1.03x slower gnificant (t=-30.82) https://github.com/python/cpython/commit/35699721a3391175d20e9ef03d434675b496 -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue47248> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47256] re: limit the maximum capturing group to 1, 073, 741, 823, reduce sizeof(match_context).
New submission from Ma Lin : These changes reduce sizeof(match_context): - 32-bit build: 36 bytes, no change. - 64-bit build: 72 bytes -> 56 bytes. sre uses stack and `match_context` struct to simulate recursive call, smaller struct brings: - deeper recursive call - less memory consume - less memory realloc Here is a test, if limit the stack size to 1 GiB, the max available value of n is: re.match(r'(ab)*', n * 'ab') # need to save MARKs 72 bytes: n = 11,184,808 64 bytes: n = 12,201,609 56 bytes: n = 13,421,770 re.match(r'(?:ab)*', n * 'ab') # no need to save MARKs 72 bytes: n = 13,421,770 64 bytes: n = 14,913,078 56 bytes: n = 16,777,213 1,073,741,823 capturing groups should enough for almost all users. If limit it to 16,383 (2-byte integer), the context size may reduce more. But maybe some patterns generated by program will have more than this number of capturing groups. 1️⃣Performance: Before regex_dna: Mean +- std dev: 149 ms +- 1 ms regex_effbot: Mean +- std dev: 2.22 ms +- 0.02 ms regex_v8: Mean +- std dev: 22.3 ms +- 0.1 ms my benchmark[1]: 13.9 sec +- 0.0 sec Commit 1. limit the maximum capture group to 1,073,741,823 regex_dna: Mean +- std dev: 150 ms +- 1 ms regex_effbot: Mean +- std dev: 2.16 ms +- 0.02 ms regex_v8: Mean +- std dev: 22.3 ms +- 0.1 ms my benchmark: 13.8 sec +- 0.0 sec Commit 2. further reduce sizeof(SRE(match_context)) regex_dna: Mean +- std dev: 150 ms +- 1 ms regex_effbot: Mean +- std dev: 2.16 ms +- 0.02 ms regex_v8: Mean +- std dev: 22.2 ms +- 0.1 ms my benchmark: 13.8 sec +- 0.1 sec If further change the types of toplevel/jump from int to char, in 32-bit build sizeof(match_context) will be reduced from 36 to 32 (In 64-bit build still 56). But it's slower on 64-bit build, so I didn't adopt it: regex_dna: Mean +- std dev: 150 ms +- 1 ms regex_effbot: Mean +- std dev: 2.18 ms +- 0.01 ms regex_v8: Mean +- std dev: 22.4 ms +- 0.1 ms my benchmark: 14.1 sec +- 0.0 sec 2️⃣ The type of match_context.count is Py_ssize_t - If change it to 4-byte integer, need to modify some engine code. - If keep it as Py_ssize_t, SRE_MAXREPEAT may >= 4 GiB in future versions. Currently SRE_MAXREPEAT can't >= 4 GiB. So the type of match_context.count is unchanged. [1] My re benchmark, it uses 16 patterns to process 100 MiB text data: https://github.com/animalize/re_benchmarks -- components: Library (Lib) messages: 416960 nosy: ezio.melotti, malin, mrabarnett, serhiy.storchaka priority: normal severity: normal status: open title: re: limit the maximum capturing group to 1,073,741,823, reduce sizeof(match_context). type: resource usage versions: Python 3.11 ___ Python tracker <https://bugs.python.org/issue47256> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47256] re: limit the maximum capturing group to 1, 073, 741, 823, reduce sizeof(match_context).
Change by Ma Lin : -- keywords: +patch pull_requests: +30437 stage: -> patch review pull_request: https://github.com/python/cpython/pull/32411 ___ Python tracker <https://bugs.python.org/issue47256> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue47248] Possible slowdown of regex searching in 3.11
Ma Lin added the comment: > Possibly related to the new atomic grouping support from GH-31982? It seems not likely. I will do some benchmarks for this issue, more information (version/platform) is welcome. -- ___ Python tracker <https://bugs.python.org/issue47248> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37907] speed-up PyLong_As*() for large longs
Change by Ma Lin : -- nosy: +Ma Lin ___ Python tracker <https://bugs.python.org/issue37907> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38015] inline function generates slightly inefficient machine code
New submission from Ma Lin : Commit 5e63ab0 replaces macro with this inline function: static inline int is_small_int(long long ival) { return -NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS; } (by default, NSMALLNEGINTS is 5, NSMALLPOSINTS is 257) However, when invoking this function, and `sizeof(value) < sizeof(long long)`, there is an unnecessary type casting. For example, on 32-bit platform, if `value` is `Py_ssize_t`, it needs to be converted to 8-byte `long long` type. The following assembly code is the beginning part of `PyLong_FromSsize_t(Py_ssize_t v)` function. (32-bit x86 build generated by GCC 9.2, with `-m32 -O2` option) Use macro before commit 5e63ab0: mov eax, DWORD PTR [esp+4] add eax, 5 cmp eax, 261 ja .L2 sal eax, 4 add eax, OFFSET FLAT:small_ints add DWORD PTR [eax], 1 ret .L2:jmp PyLong_FromSsize_t_rest(int) Use inlined function: pushebx mov eax, DWORD PTR [esp+8] mov edx, 261 mov ecx, eax mov ebx, eax sar ebx, 31 add ecx, 5 adc ebx, 0 cmp edx, ecx mov edx, 0 sbb edx, ebx jc .L7 cwde sal eax, 4 add eax, OFFSET FLAT:small_ints+80 add DWORD PTR [eax], 1 pop ebx ret .L7:pop ebx jmp PyLong_FromSsize_t_rest(int) On 32-bit x86 platform, 8-byte `long long` is implemented in using two registers, so the machine code is much longer than macro version. At least these hot functions are suffered from this: PyObject* PyLong_FromSsize_t(Py_ssize_t v) PyObject* PyLong_FromLong(long v) Replacing the inline function with a macro version will fix this: #define IS_SMALL_INT(ival) (-NSMALLNEGINTS <= (ival) && (ival) < NSMALLPOSINTS) If you want to see assembly code generated by major compilers, you can paste attached file demo.c to https://godbolt.org/ - demo.c was original written by Greg Price. - use `-m32 -O2` to generate 32-bit build. -- components: Interpreter Core files: demo.c messages: 351052 nosy: Greg Price, Ma Lin, aeros167, mark.dickinson, rhettinger, sir-sigurd priority: normal severity: normal status: open title: inline function generates slightly inefficient machine code versions: Python 3.9 Added file: https://bugs.python.org/file48583/demo.c ___ Python tracker <https://bugs.python.org/issue38015> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38015] inline function generates slightly inefficient machine code
Ma Lin added the comment: There will always be a new commit, replacing with a macro version also looks good. I have no opinion, both are fine. -- ___ Python tracker <https://bugs.python.org/issue38015> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38037] Assertion failed: object has negative ref count
New submission from Ma Lin : Adding these two lines to /Objects/longobject.c will disable the "preallocated small integer pool": #define NSMALLPOSINTS 0 #define NSMALLNEGINTS 0 Then run this reproduce code (attached): from enum import IntEnum import _signal class Handlers(IntEnum): A = _signal.SIG_DFL B = _signal.SIG_IGN When the interpreter exits, will get this error: d:\dev\cpython\PCbuild\win32>python_d.exe d:\a.py d:\dev\cpython\include\object.h:541: _Py_NegativeRefcount: Assertion failed: object has negative ref count Fatal Python error: _PyObject_AssertFailed Current thread 0x200c (most recent call first): 3.8 and 3.9 branches are affected. I'm sorry, this issue is beyond my ability. -- files: reproduce.py messages: 351196 nosy: Ma Lin priority: normal severity: normal status: open title: Assertion failed: object has negative ref count versions: Python 3.8, Python 3.9 Added file: https://bugs.python.org/file48594/reproduce.py ___ Python tracker <https://bugs.python.org/issue38037> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38037] Assertion failed: object has negative ref count
Change by Ma Lin : -- keywords: +patch pull_requests: +15355 stage: -> patch review pull_request: https://github.com/python/cpython/pull/15701 ___ Python tracker <https://bugs.python.org/issue38037> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38037] Assertion failed: object has negative ref count
Ma Lin added the comment: I did a Git bisect, this is the first bad commit: https://github.com/python/cpython/commit/9541bd321a94f13dc41163a5d7a1a847816fac84 nosy involved mates. -- nosy: +berker.peksag, nanjekyejoannah ___ Python tracker <https://bugs.python.org/issue38037> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38015] inline function generates slightly inefficient machine code
Change by Ma Lin : -- keywords: +patch pull_requests: +15365 stage: -> patch review pull_request: https://github.com/python/cpython/pull/15710 ___ Python tracker <https://bugs.python.org/issue38015> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38015] inline function generates slightly inefficient machine code
Ma Lin added the comment: Revert commit 5e63ab0 or use PR 15710, both are fine. -- ___ Python tracker <https://bugs.python.org/issue38015> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38015] inline function generates slightly inefficient machine code
Ma Lin added the comment: This range has not been changed since "preallocated small integer pool" was introduced: #define NSMALLPOSINTS 257 #define NSMALLNEGINTS 5 The commit (Jan 2007): https://github.com/python/cpython/commit/ddefaf31b366ea84250fc5090837c2b764a04102 Is it worth increase the range? FYI, build with MSVC 2017, the `small_ints` size: 32-bit build: sizeof(PyLongObject)16 bytes sizeof(small_ints)4192 bytes 64-bit build: sizeof(PyLongObject)32 bytes sizeof(small_ints)8384 bytes -- ___ Python tracker <https://bugs.python.org/issue38015> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38037] reference counter issue in signal module
Change by Ma Lin : -- title: Assertion failed: object has negative ref count -> reference counter issue in signal module ___ Python tracker <https://bugs.python.org/issue38037> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26868] Document PyModule_AddObject's behavior on error
Change by Ma Lin : -- nosy: +Ma Lin ___ Python tracker <https://bugs.python.org/issue26868> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38015] inline function generates slightly inefficient machine code
Ma Lin added the comment: > This change produces tiny, but measurable speed-up for handling small ints I didn't get measurable change, I run this command a dozen times and take the best result: D:\dev\cpython\PCbuild\amd64\python.exe -m pyperf timeit -s "from collections import deque; consume = deque(maxlen=0).extend; r = range(256)" "consume(r)" --duplicate=1000 before: Mean +- std dev: 771 ns +- 16 ns after: Mean +- std dev: 770 ns +- 10 ns Environment: 64-bit release build by MSVC 2017 CPU: i3 4160, System: latest Windows 10 64-bit Check the machine code from godbolt.org, x64 MSVC v19.14 only saves one instruction: movsxd rax, ecx x86-64 GCC 9.2 saves two instructions: lea eax, [rdi+5] cdqe -- ___ Python tracker <https://bugs.python.org/issue38015> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38056] Add examples for common text encoding Error Handlers
New submission from Ma Lin : Text descriptions about `Error Handlers` are not very friendly to novices. https://docs.python.org/3/library/codecs.html#error-handlers For example: 'xmlcharrefreplace' Replace with the appropriate XML character reference (only for encoding). Implemented in :func:`xmlcharrefreplace_errors`. 'backslashreplace' Replace with backslashed escape sequences. Implemented in :func:`backslashreplace_errors`. 'namereplace' Replace with ``\N{...}`` escape sequences (only for encoding). Implemented in :func:`namereplace_errors`. Novices may not know what these are. Giving some examples may help the reader to understand more intuitively. The effect picture is attached. I picked two characters: ß https://www.compart.com/en/unicode/U+00DF ♬ https://www.compart.com/en/unicode/U+266C -- assignee: docs@python components: Documentation files: effect.png messages: 351329 nosy: Ma Lin, docs@python priority: normal severity: normal status: open title: Add examples for common text encoding Error Handlers versions: Python 3.7, Python 3.8, Python 3.9 Added file: https://bugs.python.org/file48599/effect.png ___ Python tracker <https://bugs.python.org/issue38056> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38056] Add examples for common text encoding Error Handlers
Change by Ma Lin : -- keywords: +patch pull_requests: +15386 stage: -> patch review pull_request: https://github.com/python/cpython/pull/15732 ___ Python tracker <https://bugs.python.org/issue38056> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38037] reference counter issue in signal module
Change by Ma Lin : -- pull_requests: +15407 pull_request: https://github.com/python/cpython/pull/15753 ___ Python tracker <https://bugs.python.org/issue38037> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38015] inline function generates slightly inefficient machine code
Ma Lin added the comment: PR 15710 has been merged into the master, but the merge message is not shown here. Commit: https://github.com/python/cpython/commit/6b519985d23bd0f0bd072b5d5d5f2c60a81a19f2 Maybe this issue can be closed. -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue38015> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21872] LZMA library sometimes fails to decompress a file
Ma Lin added the comment: Some memos: 1, In liblzma, these missing bytes were copied inside `dict_repeat` function: 788 case SEQ_COPY: 789 // Repeat len bytes from distance of rep0. 790 if (unlikely(dict_repeat(&dict, rep0, &len))) { See liblzma's source code (xz-5.2 branch): https://git.tukaani.org/?p=xz.git;a=blob;f=src/liblzma/lzma/lzma_decoder.c 2, Above replies said xz's command line tools can extract the problematic files successfully. This is because xz checks `if (avail_out == 0)` first, then checks `if (avail_in == 0)` See `uncompress` function in this source code (xz-5.2 branch): https://git.tukaani.org/?p=xz.git;a=blob;f=src/xzdec/xzdec.c;hb=refs/heads/v5.2 This check order just avoids the problem. -- ___ Python tracker <https://bugs.python.org/issue21872> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38205] Python no longer compiles without small integer singletons
Ma Lin added the comment: This commit changed Py_UNREACHABLE() five days ago: https://github.com/python/cpython/commit/3ab61473ba7f3dca32d779ec2766a4faa0657923 If remove this change, it can be compiled successfully. -- nosy: +Ma Lin ___ Python tracker <https://bugs.python.org/issue38205> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38205] Python no longer compiles without small integer singletons
Ma Lin added the comment: We can change Py_UNREACHABLE() to assert(0) in longobject.c Or remove the article in Py_UNREACHABLE() -- ___ Python tracker <https://bugs.python.org/issue38205> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37812] Make implicit returns explicit in longobject.c (in CHECK_SMALL_INT)
Ma Lin added the comment: > It's not clear to me if anyone benchmarked to see if the > conversion to a macro had any measurable performance benefit. I tested on that day, also use this command: python.exe -m pyperf timeit -s "from collections import deque; consume = deque(maxlen=0).extend; r = range(256)" "consume(r)" --duplicate=1000 I remember the results are: inline function: 1.6 us macro version : 1.27 us (32-bit release build by MSVC 2017) Since the difference is too obvious, I tested it only once for each version. -- ___ Python tracker <https://bugs.python.org/issue37812> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37812] Make implicit returns explicit in longobject.c (in CHECK_SMALL_INT)
Ma Lin added the comment: > I agree that both changes should be reverted. There is another commit after the two commits: https://github.com/python/cpython/commit/c6734ee7c55add5fdc2c821729ed5f67e237a096 It is troublesome to revert them. PR 16146 is on-going, maybe we can request the author to replace `Py_UNREACHABLE()` with `assert(0)`. -- ___ Python tracker <https://bugs.python.org/issue37812> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38205] Python no longer compiles without small integer singletons
Ma Lin added the comment: If use static inline function, and Py_UNREACHABLE() inside an if-else branch that should return a value, compiler may emit warning: https://godbolt.org/z/YtcNSf MSVC v19.14: warning C4715: 'test': not all control paths return a value clang 8.0.0: warning: control may reach end of non-void function [-Wreturn-type] Other compilers (gcc, icc) don't emit this warning. This situation in real code: https://github.com/python/cpython/blob/v3.8.0b4/Include/object.h#L600 https://github.com/python/cpython/blob/v3.8.0b4/Objects/longobject.c#L3088 -- ___ Python tracker <https://bugs.python.org/issue38205> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38205] Python no longer compiles without small integer singletons
Change by Ma Lin : -- keywords: +patch pull_requests: +15860 stage: -> patch review pull_request: https://github.com/python/cpython/pull/16270 ___ Python tracker <https://bugs.python.org/issue38205> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38205] Python no longer compiles without small integer singletons
Ma Lin added the comment: PR 16270 use Py_UNREACHABLE() in a single line. It solves this particular issue. -- ___ Python tracker <https://bugs.python.org/issue38205> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37812] Make implicit returns explicit in longobject.c (in CHECK_SMALL_INT)
Ma Lin added the comment: Recent commits for longobject.c Revision: 5e63ab05f114987478a21612d918a1c0276fe9d2 Author: Greg Price Date: 19-8-25 1:19:37 Message: bpo-37812: Convert CHECK_SMALL_INT macro to a function so the return is explicit. (GH-15216) The concern for this issue is: implicit return from macro. We can add a comment before the call sites of CHECK_SMALL_INT macro, to explain that there is a possible return. Revision: 6b519985d23bd0f0bd072b5d5d5f2c60a81a19f2 Author: animalize Date: 19-9-6 14:00:56 Message: replace inline function `is_small_int` with a macro version (GH-15710) Then this commit is not necessary. Revision: c6734ee7c55add5fdc2c821729ed5f67e237a096 Author: Sergey Fedoseev Date: 19-9-12 22:41:14 Message: bpo-37802: Slightly improve perfomance of PyLong_FromUnsigned*() (GH-15192) This commit introduced a compiler warning due to this line [1]: d:\dev\cpython\objects\longobject.c(412): warning C4244: “function”: from “unsigned long” to “sdigit ”,may lose data [1] the line: return get_small_int((ival)); \ https://github.com/python/cpython/blob/master/Objects/longobject.c#L386 Revision: 42acb7b8d29d078bc97b0cfd7c4911b2266b26b9 Author: HongWeipeng <961365...@qq.com> Date: 19-9-18 23:10:15 Message: bpo-35696: Simplify long_compare() (GH-16146) IMO this commit reduces readability a bit. We can sort out these problems. -- ___ Python tracker <https://bugs.python.org/issue37812> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35696] remove unnecessary operation in long_compare()
Ma Lin added the comment: > I'd fix them, but I'm not sure if we are going to restore CHECK_SMALL_INT() > ¯\_(ツ)_/¯ I suggest we slow down, carefully sort out the recent commits for longobject.c: https://bugs.python.org/issue37812#msg352837 Make the code has consistent style, better readability... -- ___ Python tracker <https://bugs.python.org/issue35696> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38252] micro-optimize ucs1lib_find_max_char in Windows 64-bit build
New submission from Ma Lin : C type `long` is 4-byte integer in 64-bit Windows build. [1] But `ucs1lib_find_max_char()` function [2] uses SIZEOF_LONG, so it loses a little performance in 64-bit Windows build. Below is the benchmark of using SIZEOF_SIZE_T and this change: - unsigned long value = *(unsigned long *) _p; + sizt_t value = *(sizt_t *) _p; D:\dev\cpython\PCbuild\amd64\python.exe -m pyperf timeit -s "b=b'a'*10_000_000; f=b.decode;" "f('latin1')" before: 5.83 ms +- 0.05 ms after : 5.58 ms +- 0.06 ms [1] https://stackoverflow.com/questions/384502 [2] https://github.com/python/cpython/blob/v3.8.0b4/Objects/stringlib/find_max_char.h#L9 Maybe there can be more optimizations, so I didn't prepare a PR for this. -- components: Interpreter Core messages: 352970 nosy: Ma Lin, inada.naoki, serhiy.storchaka, sir-sigurd priority: normal severity: normal status: open title: micro-optimize ucs1lib_find_max_char in Windows 64-bit build type: performance versions: Python 3.9 ___ Python tracker <https://bugs.python.org/issue38252> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38252] micro-optimize ucs1lib_find_max_char in Windows 64-bit build
Ma Lin added the comment: Maybe @sir-sigurd can find more optimizations. FYI, `_Py_bytes_isascii()` function [1] also has similar code. [1] https://github.com/python/cpython/blob/v3.8.0b4/Objects/bytes_methods.c#L104 -- ___ Python tracker <https://bugs.python.org/issue38252> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38252] micro-optimize ucs1lib_find_max_char in Windows 64-bit build
Change by Ma Lin : -- keywords: +patch pull_requests: +15911 stage: -> patch review pull_request: https://github.com/python/cpython/pull/16334 ___ Python tracker <https://bugs.python.org/issue38252> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38252] Use 8-byte step to detect ASCII sequence in 64bit Windows builds
Change by Ma Lin : -- title: micro-optimize ucs1lib_find_max_char in Windows 64-bit build -> Use 8-byte step to detect ASCII sequence in 64bit Windows builds ___ Python tracker <https://bugs.python.org/issue38252> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38252] Use 8-byte step to detect ASCII sequence in 64bit Windows builds
Ma Lin added the comment: There are 4 functions have the similar code, see PR 16334. Just replaced the `unsigned long` type with `size_t` type, got these benchmarks. Can this be backported to 3.8 branch? 1. bytes.isascii() D:\dev\cpython\PCbuild\amd64\python.exe -m pyperf timeit -s "b = b'x' * 100_000_000; f = b.isascii;" "f()" +---+---+--+ | Benchmark | isascii_a | isascii_b| +===+===+==+ | timeit| 11.7 ms | 7.84 ms: 1.50x faster (-33%) | +---+---+--+ 2. bytes.decode('latin1') D:\dev\cpython\PCbuild\amd64\python.exe -m pyperf timeit -s "b = b'x' * 100_000_000; f = b.decode;" "f('latin1')" +---+--+-+ | Benchmark | latin1_a | latin1_b| +===+==+=+ | timeit| 60.3 ms | 57.4 ms: 1.05x faster (-5%) | +---+--+-+ 3. bytes.decode('ascii') D:\dev\cpython\PCbuild\amd64\python.exe -m pyperf timeit -s "b = b'x' * 100_000_000; f = b.decode;" "f('ascii')" +---+-+-+ | Benchmark | ascii_a | ascii_b | +===+=+=+ | timeit| 48.5 ms | 47.1 ms: 1.03x faster (-3%) | +---+-+-+ 4. bytes.decode('utf8') D:\dev\cpython\PCbuild\amd64\python.exe -m pyperf timeit -s "b = b'x' * 100_000_000; f = b.decode;" "f('utf8')" +---+-+-+ | Benchmark | utf8_a | utf8_b | +===+=+=+ | timeit| 48.3 ms | 47.1 ms: 1.03x faster (-3%) | +---+-+-+ -- ___ Python tracker <https://bugs.python.org/issue38252> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38321] Compiler warnings when building Python 3.8
Ma Lin added the comment: On my Windows, some non-ASCII characters cause this warning: d:\dev\cpython\modules\expat\xmltok.c : warning C4819: The file contains a character that cannot be represented in the current code page (936). Save the file in Unicode format to prevent data loss. This patch fixes the warnings, it's applicable to master/3.8 branches. https://github.com/animalize/cpython/commit/daced7575ec70ef1f888c6854760e230cda5ea64 Maybe this trivial problem is not worth a new commit, it can be fixed along with other warnings. -- nosy: +Ma Lin ___ Python tracker <https://bugs.python.org/issue38321> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38321] Compiler warnings when building Python 3.8
Ma Lin added the comment: Other warnings: c:\vstinner\python\master\objects\longobject.c(420): warning C4244: 'function': conversion from 'unsigned __int64' to 'sdigit', possible loss of data c:\vstinner\python\master\objects\longobject.c(428): warning C4267: 'function': conversion from 'size_t' to 'sdigit', possible loss of data - These warnings only appear in master branch, I will fix it at some point. (https://bugs.python.org/issue35696#msg352903) -- ___ Python tracker <https://bugs.python.org/issue38321> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38321] Compiler warnings when building Python 3.8
Ma Lin added the comment: > This file is copied directly from https://github.com/libexpat/libexpat/ > > project. Would you mind to propose your patch there? ok, I will report to there. -- ___ Python tracker <https://bugs.python.org/issue38321> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13153] IDLE 3.x on Windows exits when pasting non-BMP unicode
Ma Lin added the comment: > Thus this breaks editing the physical line past the astral character. We > cannot do anything with this. I tried, it's sad the experience is not very good. ------ nosy: +Ma Lin ___ Python tracker <https://bugs.python.org/issue13153> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38056] Overhaul Error Handlers section in codecs documentation
Ma Lin added the comment: PR 15732 became an overhaul: - replace/backslashreplace/surrogateescape were wrongly described as encoding only, in fact they can also be used in decoding. - clarify the description of surrogatepass. - add more descriptions to each handler. - add two REPL examples. - add indexes for Error Handler's name. - add default parameter values in codecs.rst - improve term "text encoding". PR 15732 has a screenshot of the Error Handlers section. -- components: +Unicode nosy: +ezio.melotti, vstinner title: Add examples for common text encoding Error Handlers -> Overhaul Error Handlers section in codecs documentation ___ Python tracker <https://bugs.python.org/issue38056> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38582] Regular match overflow
Change by Ma Lin : -- nosy: +Ma Lin type: security -> ___ Python tracker <https://bugs.python.org/issue38582> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38582] Regular match overflow
Ma Lin added the comment: An simpler reproduce code: ``` import re NUM = 99 # items = [ '(001)', '(002)', '(003)', ..., '(NUM)'] items = [r'(%03d)' % i for i in range(1, 1+NUM)] pattern = '|'.join(items) # repl = '\1\2\3...\NUM' temp = ('\\' + str(i) for i in range(1, 1+NUM)) repl = ''.join(temp) text = re.sub(pattern, repl, '(001)') print(text) # if NUM == 99 # output: (001) # if NUM == 100 # output: (001@) # if NUM == 101 # output: (001@A) ``` -- components: +Regular Expressions nosy: +ezio.melotti, mrabarnett ___ Python tracker <https://bugs.python.org/issue38582> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38582] re: backreference number in replace string can't >= 100
Ma Lin added the comment: Backreference number in replace string can't >= 100 https://github.com/python/cpython/blob/v3.8.0/Lib/sre_parse.py#L1022-L1036 If none take this, I will try to fix this issue tomorrow. -- nosy: +serhiy.storchaka title: Regular match overflow -> re: backreference number in replace string can't >= 100 versions: +Python 3.7, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue38582> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38582] re: backreference number in replace string can't >= 100
Ma Lin added the comment: @veaba Post only in English is fine. > Is this actually needed? Maybe very very few people dynamically generate some large patterns. > However, \g<...> is not accepted in a pattern. > in the "regex" module I added support for it in a pattern too. Yes, backreference number in pattern also can't >= 100 Support \g<...> in pattern is a good idea. If fix this issue, may produce backward compatibility issue: the parser will confuse backreference numbers and octal escape numbers. Maybe can clarify the limit (<=99) in the document is enough. -- ___ Python tracker <https://bugs.python.org/issue38582> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38582] re: backreference number in replace string can't >= 100
Ma Lin added the comment: Octal escape: \oooCharacter with octal value ooo As in Standard C, up to three octal digits are accepted. It only accepts UCS1 characters (ooo <= 0o377): >>> ord('\377') 255 >>> len('\378') 2 >>> '\378' == '\37' + '8' True IMHO this is not useful, and creates confusions. Maybe it can be deprecated in language level. -- ___ Python tracker <https://bugs.python.org/issue38582> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38582] re: backreference number in replace string can't >= 100
Ma Lin added the comment: > I'd still retain \0 as a special case, since it really is useful. Yes, maybe \0 is used widely, I didn't think of it. Changing is troublesome, let's keep it as is. -- ___ Python tracker <https://bugs.python.org/issue38582> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23692] Undocumented feature prevents re module from finding certain matches
Change by Ma Lin : -- nosy: +Ma Lin ___ Python tracker <https://bugs.python.org/issue23692> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37527] Timestamp conversion on windows fails with timestamps close to EPOCH
Ma Lin added the comment: issue29097 fixed bug in `datetime.fromtimestamp()`. But this issue is about `datetime.timestamp()`, not fixed yet. -- ___ Python tracker <https://bugs.python.org/issue37527> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module
Ma Lin added the comment: ping -- ___ Python tracker <https://bugs.python.org/issue41486> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43785] bz2 performance issue.
Change by Ma Lin : -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue43785> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43785] Remove RLock from BZ2File
Ma Lin added the comment: This change is backwards incompatible, it may break some code silently. If someone really needs better performance, they can write a BZ2File class without RLock by themselves, it should be easy. FYI, zlib module was added in 1997, bz2 module was added in 2002, lzma module was added in 2011. (Just curious for these years) -- ___ Python tracker <https://bugs.python.org/issue43785> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module
Ma Lin added the comment: > I don't really _like_ that this is a .h file acting as a C template to inject > effectively the same static code into each module that wants to use it... > Which I think is the concern Victor is expressing in a comment above. I think so too. The defines of BOB_BUFFER_TYPE/BOB_SIZE_TYPE/BOB_SIZE_MAX are ugly. If put the core code together, these defines can be put in a thin wrapper in _bz2module.c/_lzmamodule.c/zlibmodule.c files. This can be done now, but it's ideal to improve it more thoroughly in 3.11. _PyBytesWriter has different behavior, user may access existing data as plain data, which is impossible for _BlocksOutputBuffer. An API/code can be carefully designed, efficient/flexible/elegant, then the code may be used in some sites in CPython. -- ___ Python tracker <https://bugs.python.org/issue41486> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43787] Optimize BZ2File, GzipFile, and LZMAFile __iter__ method.
Change by Ma Lin : -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue43787> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43787] Optimize BZ2File, GzipFile, and LZMAFile __iter__ method.
Ma Lin added the comment: I think this change is safe. The behaviors should be exactly the same, except the iterators are different objects (obj vs obj._buffer). -- ___ Python tracker <https://bugs.python.org/issue43787> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module
Ma Lin added the comment: > The defines of BOB_BUFFER_TYPE/BOB_SIZE_TYPE/BOB_SIZE_MAX are ugly. If put > the core code together, these defines can be put in a thin wrapper in > _bz2module.c/_lzmamodule.c/zlibmodule.c files. I tried, it looks well. I will updated the PR within one or two days. The code is more concise, and the burden of review is not big. -- ___ Python tracker <https://bugs.python.org/issue41486> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module
Ma Lin added the comment: Very sorry for update at the last moment. But after the update, we should no need to touch it in the future, so I think it's worthy. Please review the last commit in PR 21740, the previous commits have not been changed. IMO if use a Git client such as TortoiseGit, reviewing may be more convenient. The changes: 1, Move `Modules/blocks_output_buffer.h` to `Include/internal/pycore_blocks_output_buffer.h` Keep the `Modules` folder clean. 2, Ask the user to initialize the struct instance like this, and use assertions to check it: _BlocksOutputBuffer buffer = {.list = NULL}; Then no longer worry about whether buffer.list is uninitialized in error handling. There is an extra assignment, but it's beneficial to long-term code maintenance. 3, Change the type of BUFFER_BLOCK_SIZE from `int` to `Py_ssize_t`. The core code can remove a few type casts. 4, These functions return allocated size on success, return -1 on failure: _BlocksOutputBuffer_Init() _BlocksOutputBuffer_InitAndGrow() _BlocksOutputBuffer_InitWithSize() _BlocksOutputBuffer_Grow() If the code is used in other sites, this API is simpler. 5, All functions are decorated with `inline`. If the compiler is smart enough, it's possible to eliminate some code when `max_length` is constant and < 0. -- ___ Python tracker <https://bugs.python.org/issue41486> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41735] Thread locks in zlib module may go wrong in rare case
Ma Lin added the comment: Thanks for review. -- ___ Python tracker <https://bugs.python.org/issue41735> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module
Ma Lin added the comment: The above changes were made in this commit: split core code and wrappers 55705f6dc28ff4dc6183e0eb57312c885d19090a After that commit, there is a new commit, it resolves the code conflicts introduced by PR 22126 one hour ago. Merge branch 'master' into blocks_output_buffer 45d752649925765b1b3cf39e9045270e92082164 Sorry to complicate the review again. I should ask Łukasz Langa to merge PR 22126 after this issue is resolved, since resolving code conflicts in PR 22126 is easier. For the change from 55705f6 to 45d7526, see the uploaded file (45d7526.diff), it can also be easily seen with a Git client. -- Added file: https://bugs.python.org/file49993/45d7526.diff ___ Python tracker <https://bugs.python.org/issue41486> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module
Ma Lin added the comment: Thanks for reviewing this big patch. Your review makes the code better. -- ___ Python tracker <https://bugs.python.org/issue41486> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module
Change by Ma Lin : -- pull_requests: +24429 pull_request: https://github.com/python/cpython/pull/25738 ___ Python tracker <https://bugs.python.org/issue41486> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module
Ma Lin added the comment: Found a backward incompatible behavior. Before the patch, in 64-bit build, zlib module allows the initial size > UINT32_MAX. It creates a bytes object, and uses a sliding window to deal with the UINT32_MAX limit: https://github.com/python/cpython/blob/v3.9.4/Modules/zlibmodule.c#L183 After the patch, when init_size > UINT32_MAX, it raises a ValueError. PR 25738 fixes this backward incompatibility. If the initial size > UINT32_MAX, it clamps to UINT32_MAX, rather than raising an exception. Moreover, if you don't mind, I would like to take this opportunity to rename the wrapper functions from Buffer_* to OutputBuffer_*, so that the readers can easily distinguish between input buffer and output buffer. If you don't think it's necessary, you may merge PR 25738 as is. -- ___ Python tracker <https://bugs.python.org/issue41486> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33376] [pysqlite] Duplicate rows can be returned after rolling back a transaction
Ma Lin added the comment: Erlend, please take a look at this bug. -- ___ Python tracker <https://bugs.python.org/issue33376> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44114] Incorrect function signatures in dictobject.c
Change by Ma Lin : -- nosy: +methane ___ Python tracker <https://bugs.python.org/issue44114> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module
Change by Ma Lin : -- pull_requests: +24779 pull_request: https://github.com/python/cpython/pull/26143 ___ Python tracker <https://bugs.python.org/issue41486> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module
Ma Lin added the comment: Sorry, for the (init_size > UINT32_MAX) problem, I have a better solution. Please imagine this scenario: - before the patch - in 64-bit build - use zlib.decompress() function - the exact decompressed size is known and > UINT32_MAX (e.g. 10 GiB) If set the `bufsize` argument to the decompressed size, it used to have a fast path: zlib.decompress(data, bufsize=10*1024*1024*1024) Fast path when (the initial size == the actual size): https://github.com/python/cpython/blob/v3.9.5/Modules/zlibmodule.c#L424-L426 https://github.com/python/cpython/blob/v3.9.5/Objects/bytesobject.c#L3008-L3011 But in the current code, the initial size is clamped to UINT32_MAX, so there are two regressions: 1. allocate double RAM. (~20 GiB, blocks and the final bytes) 2. need to memcpy from blocks to the final bytes. PR 26143 uses an UINT32_MAX sliding window for the first block, now the initial buffer size can be greater than UINT32_MAX. _BlocksOutputBuffer_Finish() already has a fast path for single block. Benchmark this code: zlib.decompress(data, bufsize=10*1024*1024*1024) time RAM before: 7.92 sec, ~20 GiB after: 6.61 sec, 10 GiB (AMD 3600X, DDR4-3200, decompressed data is 10_GiB * b'a') Maybe some user code rely on this corner case. This should be the last revision, then there is no regression in any case. -- ___ Python tracker <https://bugs.python.org/issue41486> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43650] MemoryError on zip.read in shutil._unpack_zipfile
Change by Ma Lin : -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue43650> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44134] lzma: stream padding in xz files
Change by Ma Lin : -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue44134> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44439] PickleBuffer doesn't have __len__ method
New submission from Ma Lin : If run this code, it will raise an exception: import pickle import lzma import pandas as pd with lzma.open("test.xz", "wb") as file: pickle.dump(pd.DataFrame(range(1_000_000)), file, protocol=5) The exception: Traceback (most recent call last): File "E:\testlen.py", line 7, in pickle.dump(pd.DataFrame(range(1_000_000)), file, protocol=5) File "D:\Python39\lib\lzma.py", line 234, in write self._pos += len(data) TypeError: object of type 'pickle.PickleBuffer' has no len() The exception is raised in lzma.LZMAFile.write() method: https://github.com/python/cpython/blob/v3.10.0b2/Lib/lzma.py#L238 PickleBuffer doesn't have .__len__ method, is it intended? -- messages: 395971 nosy: malin, pitrou priority: normal severity: normal status: open title: PickleBuffer doesn't have __len__ method ___ Python tracker <https://bugs.python.org/issue44439> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44439] PickleBuffer doesn't have __len__ method
Ma Lin added the comment: Ok, I'm working on a PR. -- ___ Python tracker <https://bugs.python.org/issue44439> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44439] PickleBuffer doesn't have __len__ method
Change by Ma Lin : -- keywords: +patch pull_requests: +25350 stage: -> patch review pull_request: https://github.com/python/cpython/pull/26764 ___ Python tracker <https://bugs.python.org/issue44439> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44439] stdlib wrongly uses len() for bytes-like object
Ma Lin added the comment: I am checking all the .py files in `Lib` folder. hmac.py has two len() bugs: https://github.com/python/cpython/blob/v3.10.0b3/Lib/hmac.py#L212 https://github.com/python/cpython/blob/v3.10.0b3/Lib/hmac.py#L214 I think PR 26764 is prepared, it fixes the len() bugs in bz2.py/lzma.py files. -- nosy: +christian.heimes title: PickleBuffer doesn't have __len__ method -> stdlib wrongly uses len() for bytes-like object ___ Python tracker <https://bugs.python.org/issue44439> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44458] Duplicate symbol _BUFFER_BLOCK_SIZE when statically linking multiple modules
Change by Ma Lin : -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue44458> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com