[issue23224] bz2/lzma: Compressor/Decompressor objects are only initialized in __init__

2021-12-19 Thread Ma Lin


Ma Lin  added the comment:

These can be done in .__new__() method:
- create thread lock
- create (de)?compression context
- initialize (de)?compressor states

In .__init__() method, only set (de)?compression parameters. And prevent 
.__init__() method from being called multiple times. 

This mode works fine in my pyzstd module (A Python bindings to zstd library).
But I think very few people will encounter this problem, we can leave it.

--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue23224>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46255] Remove unnecessary check in _IOBase._check*() methods

2022-01-04 Thread Ma Lin


New submission from Ma Lin :

These methods are METH_NOARGS, in all cases the second parameter will be NULL.

{"_checkClosed",   _PyIOBase_check_closed, METH_NOARGS},
{"_checkSeekable", _PyIOBase_check_seekable, METH_NOARGS},
{"_checkReadable", _PyIOBase_check_readable, METH_NOARGS},
{"_checkWritable", _PyIOBase_check_writable, METH_NOARGS},

--
components: IO
messages: 409672
nosy: malin
priority: normal
severity: normal
status: open
title: Remove unnecessary check in _IOBase._check*() methods
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue46255>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46255] Remove unnecessary check in _IOBase._check*() methods

2022-01-04 Thread Ma Lin


Change by Ma Lin :


--
keywords: +patch
pull_requests: +28606
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/30397

___
Python tracker 
<https://bugs.python.org/issue46255>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46255] Remove unnecessary check in _IOBase._check*() methods

2022-01-19 Thread Ma Lin


Change by Ma Lin :


--
resolution:  -> not a bug
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue46255>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47040] Remove an invalid versionchanged in doc

2022-03-16 Thread Ma Lin


New submission from Ma Lin :

Since CPython 3.0.0, the checksums are always truncated to `unsigned int`:
https://github.com/python/cpython/blob/v3.0/Modules/zlibmodule.c#L930
https://github.com/python/cpython/blob/v3.0/Modules/zlibmodule.c#L950

--
assignee: docs@python
components: Documentation, Library (Lib)
messages: 415386
nosy: docs@python, gregory.p.smith, malin
priority: normal
severity: normal
status: open
title: Remove an invalid versionchanged in doc
versions: Python 3.10, Python 3.11, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue47040>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47040] Remove an invalid versionchanged in doc

2022-03-16 Thread Ma Lin


Change by Ma Lin :


--
keywords: +patch
pull_requests: +30046
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/31955

___
Python tracker 
<https://bugs.python.org/issue47040>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47040] Remove invalid versionchanged in doc

2022-03-17 Thread Ma Lin


Ma Lin  added the comment:

`binascii.crc32` doc also has this invalid document:
doc: https://docs.python.org/3/library/binascii.html#binascii.crc32
3.0.0 code: https://github.com/python/cpython/blob/v3.0/Modules/binascii.c#L1035

In addition, `binascii.crc32` has an `USE_ZLIB_CRC32` code path, but it's buggy.
The length of zlib `crc32()` function is `unsigned int`, so if use 
`USE_ZLIB_CRC32` code path and the data > 4GiB, the result is wrong.
Should we remove `USE_ZLIB_CRC32` code path in `binascii.c`, or fix it?

`USE_ZLIB_CRC32` code path in binascii.c (bug code): 
https://github.com/python/cpython/blob/v3.11.0a6/Modules/binascii.c#L756-L767
crc32 in zlibmodule.c, it uses an UINT_MAX sliding window (right code):
 https://github.com/python/cpython/blob/v3.11.0a6/Modules/zlibmodule.c#L1436

--
title: Remove an invalid versionchanged in doc -> Remove invalid versionchanged 
in doc

___
Python tracker 
<https://bugs.python.org/issue47040>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44439] stdlib wrongly uses len() for bytes-like object

2022-03-19 Thread Ma Lin


Ma Lin  added the comment:

`_Stream.write` method in tarfile.py also has this code:
https://github.com/python/cpython/blob/v3.11.0a6/Lib/tarfile.py#L434

But this bug will not be triggered. When calling this method, always pass bytes 
data.

`_ConnectionBase.send_bytes` method in multiprocessing\connection.py can be 
micro-optimized:
https://github.com/python/cpython/blob/v3.11.0a6/Lib/multiprocessing/connection.py#L193
This can be done in another issue.

So I think this issue can be closed.

--
stage: patch review -> resolved
status: pending -> closed

___
Python tracker 
<https://bugs.python.org/issue44439>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47040] Fix confusing versionchanged note in crc32 and adler32

2022-03-19 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +30090
pull_request: https://github.com/python/cpython/pull/32002

___
Python tracker 
<https://bugs.python.org/issue47040>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47040] Fix confusing versionchanged note in crc32 and adler32

2022-03-19 Thread Ma Lin


Ma Lin  added the comment:

PR 32002 is for 3.10/3.9 branches.

--

___
Python tracker 
<https://bugs.python.org/issue47040>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-03-21 Thread Ma Lin


Ma Lin  added the comment:

If run this code, would it be slower?

bytes_hash = hash(bytes_data)
bytes_hash = hash(bytes_data)  # get hash twice

--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-03-22 Thread Ma Lin


Ma Lin  added the comment:

Since hash() is a public function, maybe some users use hash value to manage 
bytes objects in their own way, then there may be a performance regression.

For a rough example, dispatch data to 16 servers.

h = hash(b)
sendto(server_number=h & 0xF, data=b)

--

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-03-22 Thread Ma Lin


Ma Lin  added the comment:

RAM is now relatively cheaper than CPU.
1 million bytes object additionally use 7.629 MiB RAM for ob_shash. 
(100_*8/1024/1024).
This causes hash() performance regression anyway.

--

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-03-23 Thread Ma Lin


Ma Lin  added the comment:

If put a bytes object into multiple dicts/sets, the hash need to be computed 
multiple times. This seems a common usage.

bytes is a very basic type, users may use it in various ways. And unskilled 
users may checking the same bytes object against dicts/sets many times.

FYI, 1 GiB data:

function seconds
hash()   0.40
binascii.crc32() 1.66   (Gregory P. Smith is trying to improve this)
zlib.crc32() 0.65

--

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46864] Deprecate ob_shash in BytesObject

2022-03-24 Thread Ma Lin

Ma Lin  added the comment:

> I posted remove-bytes-hash.patch in this issue. Would you measure how this 
> affects whole application performance rather than micro benchmarks?

I guess not much difference in benchmarks.
But if put a bytes object into multiple dicts/sets, and len(bytes_key) is 
large, it will take a long time. (1 GiB 0.40 seconds on i5-11500 DDR4-3200)
The length of bytes can be arbitrary,so computing time may be very different.

Is it possible to let code objects use other types? In addition to ob_hash, 
maybe the extra byte \x00 at the end can be saved.

--

___
Python tracker 
<https://bugs.python.org/issue46864>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35859] Capture behavior depends on the order of an alternation

2022-03-29 Thread Ma Lin


Ma Lin  added the comment:

Thanks for your review.

3.11 has a more powerful re module, also thank you for rebasing the atomic 
grouping code.

--

___
Python tracker 
<https://bugs.python.org/issue35859>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2022-03-29 Thread Ma Lin


Ma Lin  added the comment:

My PR methods are suboptimal, so I closed them.

The number of REPEAT can be counted when compiling a pattern, and allocate a 
`SRE_REPEAT` array in `SRE_STATE` (with that number items).

It seem at any time, a REPEAT will only have one in active, so a `SRE_REPEAT` 
array is fine.
regex module does like this:
https://github.com/mrabarnett/mrab-regex/blob/hg/regex_3/_regex.c#L18287-L18288

Can the number of REPEAT be placed in `SRE_OP_INFO`?
And add a field to `SRE_OP_REPEAT` to indicate the index of this REPEAT.

--

___
Python tracker 
<https://bugs.python.org/issue23689>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-03-29 Thread Ma Lin


Ma Lin  added the comment:

Please don't merge too close to the 3.11 beta1 release date, I'll submit PRs 
after this merged.

--

___
Python tracker 
<https://bugs.python.org/issue47152>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2022-03-30 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +30265
pull_request: https://github.com/python/cpython/pull/32188

___
Python tracker 
<https://bugs.python.org/issue23689>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-03-30 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +30266
pull_request: https://github.com/python/cpython/pull/32188

___
Python tracker 
<https://bugs.python.org/issue47152>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2022-03-31 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +30298
pull_request: https://github.com/python/cpython/pull/32223

___
Python tracker 
<https://bugs.python.org/issue23689>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47199] multiprocessing: micro-optimize Connection.send_bytes() method

2022-04-01 Thread Ma Lin


New submission from Ma Lin :

`bytes(m)` can be replaced by memoryview.cast('B'), then no need for data 
copying.

m = memoryview(buf)
# HACK for byte-indexing of non-bytewise buffers (e.g. array.array)
if m.itemsize > 1:
m = memoryview(bytes(m))
n = len(m)

https://github.com/python/cpython/blob/v3.11.0a6/Lib/multiprocessing/connection.py#L190-L194

--
components: Library (Lib)
messages: 416538
nosy: malin
priority: normal
severity: normal
status: open
title: multiprocessing: micro-optimize Connection.send_bytes() method
type: resource usage
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue47199>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47199] multiprocessing: micro-optimize Connection.send_bytes() method

2022-04-01 Thread Ma Lin


Change by Ma Lin :


--
keywords: +patch
pull_requests: +30318
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/32247

___
Python tracker 
<https://bugs.python.org/issue47199>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-02 Thread Ma Lin


Ma Lin  added the comment:

In `Modules` folder, there are _sre.c/sre.h/sre_constants.h/sre_lib.h files. 
Will them be put into a folder?

--

___
Python tracker 
<https://bugs.python.org/issue47152>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2022-04-03 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +30344
pull_request: https://github.com/python/cpython/pull/32283

___
Python tracker 
<https://bugs.python.org/issue23689>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-04 Thread Ma Lin


Ma Lin  added the comment:

Match.regs is an undocumented attribute, it seems it has existed since 1991. 
Can it be removed?

https://github.com/python/cpython/blob/ff2cf1d7d5fb25224f3ff2e0c678d36f78e1f3cb/Modules/_sre/sre.c#L2871

--

___
Python tracker 
<https://bugs.python.org/issue47152>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-04 Thread Ma Lin


Ma Lin  added the comment:

> cryptic name

In very early versions, "mark" was called register/region.
https://github.com/python/cpython/blob/v1.0.1/Modules/regexpr.h#L48-L52

If span is accessed repeatedly, it's faster than Match.span().
Maybe consider renaming it, and make it as public attribute.

--

___
Python tracker 
<https://bugs.python.org/issue47152>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47211] Remove re.template() and re.TEMPLATE

2022-04-06 Thread Ma Lin


Change by Ma Lin :


--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue47211>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47248] Possible slowdown of regex searching in 3.11

2022-04-07 Thread Ma Lin


Ma Lin  added the comment:

Could you give the two versions? I will do a git bisect.

I tested 356997c~1 and 356997c [1], msvc2022 non-pgo release build:

# regex_dna ###
an +- std dev: 151 ms +- 1 ms -> 152 ms +- 1 ms: 1.01x slower
t significant

# regex_effbot ###
an +- std dev: 2.47 ms +- 0.01 ms -> 2.46 ms +- 0.02 ms: 1.00x faster
t significant

# regex_v8 ###
an +- std dev: 21.7 ms +- 0.1 ms -> 22.4 ms +- 0.1 ms: 1.03x slower
gnificant (t=-30.82)

https://github.com/python/cpython/commit/35699721a3391175d20e9ef03d434675b496

--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue47248>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47256] re: limit the maximum capturing group to 1, 073, 741, 823, reduce sizeof(match_context).

2022-04-07 Thread Ma Lin

New submission from Ma Lin :

These changes reduce sizeof(match_context):
- 32-bit build: 36 bytes, no change.
- 64-bit build: 72 bytes -> 56 bytes.

sre uses stack and `match_context` struct to simulate recursive call, smaller 
struct brings:
- deeper recursive call
- less memory consume
- less memory realloc

Here is a test, if limit the stack size to 1 GiB, the max available value of n 
is:

re.match(r'(ab)*', n * 'ab')   # need to save MARKs
72 bytes: n = 11,184,808
64 bytes: n = 12,201,609
56 bytes: n = 13,421,770

re.match(r'(?:ab)*', n * 'ab') # no need to save MARKs
72 bytes: n = 13,421,770
64 bytes: n = 14,913,078
56 bytes: n = 16,777,213

1,073,741,823 capturing groups should enough for almost all users.
If limit it to 16,383 (2-byte integer), the context size may reduce more. But 
maybe some patterns generated by program will have more than this number of 
capturing groups.

1️⃣Performance:

Before
regex_dna: Mean +- std dev: 149 ms +- 1 ms
regex_effbot: Mean +- std dev: 2.22 ms +- 0.02 ms
regex_v8: Mean +- std dev: 22.3 ms +- 0.1 ms
my benchmark[1]: 13.9 sec +- 0.0 sec

Commit 1. limit the maximum capture group to 1,073,741,823
regex_dna: Mean +- std dev: 150 ms +- 1 ms
regex_effbot: Mean +- std dev: 2.16 ms +- 0.02 ms
regex_v8: Mean +- std dev: 22.3 ms +- 0.1 ms
my benchmark: 13.8 sec +- 0.0 sec

Commit 2. further reduce sizeof(SRE(match_context))
regex_dna: Mean +- std dev: 150 ms +- 1 ms
regex_effbot: Mean +- std dev: 2.16 ms +- 0.02 ms
regex_v8: Mean +- std dev: 22.2 ms +- 0.1 ms
my benchmark: 13.8 sec +- 0.1 sec

If further change the types of toplevel/jump from int to char, in 32-bit build 
sizeof(match_context) will be reduced from 36 to 32 (In 64-bit build still 56). 
But it's slower on 64-bit build, so I didn't adopt it:
regex_dna: Mean +- std dev: 150 ms +- 1 ms
regex_effbot: Mean +- std dev: 2.18 ms +- 0.01 ms
regex_v8: Mean +- std dev: 22.4 ms +- 0.1 ms
my benchmark: 14.1 sec +- 0.0 sec

2️⃣ The type of match_context.count is Py_ssize_t
- If change it to 4-byte integer, need to modify some engine code.
- If keep it as Py_ssize_t, SRE_MAXREPEAT may >= 4 GiB in future versions.  
  Currently SRE_MAXREPEAT can't >= 4 GiB.
So the type of match_context.count is unchanged.

[1] My re benchmark, it uses 16 patterns to process 100 MiB text data:
https://github.com/animalize/re_benchmarks

--
components: Library (Lib)
messages: 416960
nosy: ezio.melotti, malin, mrabarnett, serhiy.storchaka
priority: normal
severity: normal
status: open
title: re: limit the maximum capturing group to 1,073,741,823, reduce 
sizeof(match_context).
type: resource usage
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue47256>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47256] re: limit the maximum capturing group to 1, 073, 741, 823, reduce sizeof(match_context).

2022-04-08 Thread Ma Lin


Change by Ma Lin :


--
keywords: +patch
pull_requests: +30437
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/32411

___
Python tracker 
<https://bugs.python.org/issue47256>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47248] Possible slowdown of regex searching in 3.11

2022-04-08 Thread Ma Lin


Ma Lin  added the comment:

> Possibly related to the new atomic grouping support from GH-31982?

It seems not likely.
I will do some benchmarks for this issue, more information (version/platform) 
is welcome.

--

___
Python tracker 
<https://bugs.python.org/issue47248>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37907] speed-up PyLong_As*() for large longs

2019-08-22 Thread Ma Lin


Change by Ma Lin :


--
nosy: +Ma Lin

___
Python tracker 
<https://bugs.python.org/issue37907>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38015] inline function generates slightly inefficient machine code

2019-09-02 Thread Ma Lin


New submission from Ma Lin :

Commit 5e63ab0 replaces macro with this inline function:

static inline int
is_small_int(long long ival)
{
return -NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS;
}

(by default, NSMALLNEGINTS is 5, NSMALLPOSINTS is 257)


However, when invoking this function, and `sizeof(value) < sizeof(long long)`, 
there is an unnecessary type casting.

For example, on 32-bit platform, if `value` is `Py_ssize_t`, it needs to be 
converted to 8-byte `long long` type.

The following assembly code is the beginning part of 
`PyLong_FromSsize_t(Py_ssize_t v)` function.
(32-bit x86 build generated by GCC 9.2, with `-m32 -O2` option)

Use macro before commit 5e63ab0:
mov eax, DWORD PTR [esp+4]
add eax, 5
cmp eax, 261
ja  .L2
sal eax, 4
add eax, OFFSET FLAT:small_ints
add DWORD PTR [eax], 1
ret
.L2:jmp PyLong_FromSsize_t_rest(int)

Use inlined function:
pushebx
mov eax, DWORD PTR [esp+8]
mov edx, 261
mov ecx, eax
mov ebx, eax
sar ebx, 31
add ecx, 5
adc ebx, 0
cmp edx, ecx
mov edx, 0
sbb edx, ebx
jc  .L7
cwde
sal eax, 4
add eax, OFFSET FLAT:small_ints+80
add DWORD PTR [eax], 1
pop ebx
ret
.L7:pop ebx
jmp PyLong_FromSsize_t_rest(int)

On 32-bit x86 platform, 8-byte `long long` is implemented in using two 
registers, so the machine code is much longer than macro version.

At least these hot functions are suffered from this:
  PyObject* PyLong_FromSsize_t(Py_ssize_t v)
  PyObject* PyLong_FromLong(long v)

Replacing the inline function with a macro version will fix this:
#define IS_SMALL_INT(ival) (-NSMALLNEGINTS <= (ival) && (ival) < NSMALLPOSINTS)

If you want to see assembly code generated by major compilers, you can paste 
attached file demo.c to https://godbolt.org/
- demo.c was original written by Greg Price.
- use `-m32 -O2` to generate 32-bit build.

--
components: Interpreter Core
files: demo.c
messages: 351052
nosy: Greg Price, Ma Lin, aeros167, mark.dickinson, rhettinger, sir-sigurd
priority: normal
severity: normal
status: open
title: inline function generates slightly inefficient machine code
versions: Python 3.9
Added file: https://bugs.python.org/file48583/demo.c

___
Python tracker 
<https://bugs.python.org/issue38015>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38015] inline function generates slightly inefficient machine code

2019-09-02 Thread Ma Lin


Ma Lin  added the comment:

There will always be a new commit, replacing with a macro version also looks 
good.

I have no opinion, both are fine.

--

___
Python tracker 
<https://bugs.python.org/issue38015>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38037] Assertion failed: object has negative ref count

2019-09-05 Thread Ma Lin


New submission from Ma Lin :

Adding these two lines to /Objects/longobject.c will disable the "preallocated 
small integer pool":

#define NSMALLPOSINTS  0
#define NSMALLNEGINTS  0

Then run this reproduce code (attached):

from enum import IntEnum
import _signal

class Handlers(IntEnum):
A = _signal.SIG_DFL
B = _signal.SIG_IGN

When the interpreter exits, will get this error:

d:\dev\cpython\PCbuild\win32>python_d.exe d:\a.py
d:\dev\cpython\include\object.h:541: _Py_NegativeRefcount: Assertion 
failed: object has negative ref count

Fatal Python error: _PyObject_AssertFailed

Current thread 0x200c (most recent call first):

3.8 and 3.9 branches are affected.
I'm sorry, this issue is beyond my ability.

--
files: reproduce.py
messages: 351196
nosy: Ma Lin
priority: normal
severity: normal
status: open
title: Assertion failed: object has negative ref count
versions: Python 3.8, Python 3.9
Added file: https://bugs.python.org/file48594/reproduce.py

___
Python tracker 
<https://bugs.python.org/issue38037>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38037] Assertion failed: object has negative ref count

2019-09-05 Thread Ma Lin


Change by Ma Lin :


--
keywords: +patch
pull_requests: +15355
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/15701

___
Python tracker 
<https://bugs.python.org/issue38037>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38037] Assertion failed: object has negative ref count

2019-09-05 Thread Ma Lin


Ma Lin  added the comment:

I did a Git bisect, this is the first bad commit:
https://github.com/python/cpython/commit/9541bd321a94f13dc41163a5d7a1a847816fac84

nosy involved mates.

--
nosy: +berker.peksag, nanjekyejoannah

___
Python tracker 
<https://bugs.python.org/issue38037>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38015] inline function generates slightly inefficient machine code

2019-09-05 Thread Ma Lin


Change by Ma Lin :


--
keywords: +patch
pull_requests: +15365
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/15710

___
Python tracker 
<https://bugs.python.org/issue38015>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38015] inline function generates slightly inefficient machine code

2019-09-05 Thread Ma Lin


Ma Lin  added the comment:

Revert commit 5e63ab0 or use PR 15710, both are fine.

--

___
Python tracker 
<https://bugs.python.org/issue38015>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38015] inline function generates slightly inefficient machine code

2019-09-06 Thread Ma Lin


Ma Lin  added the comment:

This range has not been changed since "preallocated small integer pool" was 
introduced:

#define NSMALLPOSINTS   257
#define NSMALLNEGINTS   5

The commit (Jan 2007):
https://github.com/python/cpython/commit/ddefaf31b366ea84250fc5090837c2b764a04102


Is it worth increase the range?
FYI, build with MSVC 2017, the `small_ints` size:

32-bit build:
sizeof(PyLongObject)16 bytes
sizeof(small_ints)4192 bytes

64-bit build:
sizeof(PyLongObject)32 bytes
sizeof(small_ints)8384 bytes

--

___
Python tracker 
<https://bugs.python.org/issue38015>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38037] reference counter issue in signal module

2019-09-06 Thread Ma Lin


Change by Ma Lin :


--
title: Assertion failed: object has negative ref count -> reference counter 
issue in signal module

___
Python tracker 
<https://bugs.python.org/issue38037>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26868] Document PyModule_AddObject's behavior on error

2019-09-07 Thread Ma Lin


Change by Ma Lin :


--
nosy: +Ma Lin

___
Python tracker 
<https://bugs.python.org/issue26868>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38015] inline function generates slightly inefficient machine code

2019-09-07 Thread Ma Lin


Ma Lin  added the comment:

> This change produces tiny, but measurable speed-up for handling small ints

I didn't get measurable change, I run this command a dozen times and take the 
best result:

D:\dev\cpython\PCbuild\amd64\python.exe -m pyperf timeit -s "from collections 
import deque; consume = deque(maxlen=0).extend; r = range(256)" "consume(r)"  
--duplicate=1000

before: Mean +- std dev: 771 ns +- 16 ns
after:  Mean +- std dev: 770 ns +- 10 ns

Environment:
64-bit release build by MSVC 2017
CPU: i3 4160, System: latest Windows 10 64-bit

Check the machine code from godbolt.org, x64 MSVC v19.14 only saves one 
instruction:
movsxd  rax, ecx

x86-64 GCC 9.2 saves two instructions:
lea eax, [rdi+5]
cdqe

--

___
Python tracker 
<https://bugs.python.org/issue38015>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38056] Add examples for common text encoding Error Handlers

2019-09-08 Thread Ma Lin

New submission from Ma Lin :

Text descriptions about `Error Handlers` are not very friendly to novices.
https://docs.python.org/3/library/codecs.html#error-handlers

For example:

'xmlcharrefreplace'
Replace with the appropriate XML character reference (only for encoding).  
Implemented in :func:`xmlcharrefreplace_errors`. 

'backslashreplace'
Replace with backslashed escape sequences. Implemented in 
:func:`backslashreplace_errors`.

'namereplace'
Replace with ``\N{...}`` escape sequences (only for encoding).  Implemented 
in :func:`namereplace_errors`.

Novices may not know what these are.
Giving some examples may help the reader to understand more intuitively.
The effect picture is attached.

I picked two characters:
ß  https://www.compart.com/en/unicode/U+00DF
♬ https://www.compart.com/en/unicode/U+266C

--
assignee: docs@python
components: Documentation
files: effect.png
messages: 351329
nosy: Ma Lin, docs@python
priority: normal
severity: normal
status: open
title: Add examples for common text encoding Error Handlers
versions: Python 3.7, Python 3.8, Python 3.9
Added file: https://bugs.python.org/file48599/effect.png

___
Python tracker 
<https://bugs.python.org/issue38056>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38056] Add examples for common text encoding Error Handlers

2019-09-08 Thread Ma Lin


Change by Ma Lin :


--
keywords: +patch
pull_requests: +15386
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/15732

___
Python tracker 
<https://bugs.python.org/issue38056>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38037] reference counter issue in signal module

2019-09-09 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +15407
pull_request: https://github.com/python/cpython/pull/15753

___
Python tracker 
<https://bugs.python.org/issue38037>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38015] inline function generates slightly inefficient machine code

2019-09-09 Thread Ma Lin


Ma Lin  added the comment:

PR 15710 has been merged into the master, but the merge message is not shown 
here.
Commit: 
https://github.com/python/cpython/commit/6b519985d23bd0f0bd072b5d5d5f2c60a81a19f2

Maybe this issue can be closed.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue38015>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21872] LZMA library sometimes fails to decompress a file

2019-09-13 Thread Ma Lin


Ma Lin  added the comment:

Some memos:

1, In liblzma, these missing bytes were copied inside `dict_repeat` function:

 788 case SEQ_COPY:
 789 // Repeat len bytes from distance of rep0.
 790 if (unlikely(dict_repeat(&dict, rep0, &len))) {

See liblzma's source code (xz-5.2 branch):
https://git.tukaani.org/?p=xz.git;a=blob;f=src/liblzma/lzma/lzma_decoder.c

2, Above replies said xz's command line tools can extract the problematic files 
successfully.

This is because xz checks `if (avail_out == 0)` first, then checks `if 
(avail_in == 0)`
See `uncompress` function in this source code (xz-5.2 branch):
https://git.tukaani.org/?p=xz.git;a=blob;f=src/xzdec/xzdec.c;hb=refs/heads/v5.2

This check order just avoids the problem.

--

___
Python tracker 
<https://bugs.python.org/issue21872>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38205] Python no longer compiles without small integer singletons

2019-09-17 Thread Ma Lin


Ma Lin  added the comment:

This commit changed Py_UNREACHABLE() five days ago:

https://github.com/python/cpython/commit/3ab61473ba7f3dca32d779ec2766a4faa0657923

If remove this change, it can be compiled successfully.

--
nosy: +Ma Lin

___
Python tracker 
<https://bugs.python.org/issue38205>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38205] Python no longer compiles without small integer singletons

2019-09-17 Thread Ma Lin


Ma Lin  added the comment:

We can change Py_UNREACHABLE() to assert(0) in longobject.c
Or remove the article in Py_UNREACHABLE()

--

___
Python tracker 
<https://bugs.python.org/issue38205>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37812] Make implicit returns explicit in longobject.c (in CHECK_SMALL_INT)

2019-09-17 Thread Ma Lin


Ma Lin  added the comment:

> It's not clear to me if anyone benchmarked to see if the
> conversion to a macro had any measurable performance benefit.

I tested on that day, also use this command: 

python.exe -m pyperf timeit -s "from collections import deque; consume = 
deque(maxlen=0).extend; r = range(256)" "consume(r)"  --duplicate=1000

I remember the results are:
inline function: 1.6  us
macro version  : 1.27 us
(32-bit release build by MSVC 2017)

Since the difference is too obvious, I tested it only once for each version.

--

___
Python tracker 
<https://bugs.python.org/issue37812>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37812] Make implicit returns explicit in longobject.c (in CHECK_SMALL_INT)

2019-09-17 Thread Ma Lin


Ma Lin  added the comment:

> I agree that both changes should be reverted.

There is another commit after the two commits:
https://github.com/python/cpython/commit/c6734ee7c55add5fdc2c821729ed5f67e237a096

It is troublesome to revert them.

PR 16146 is on-going, maybe we can request the author to replace 
`Py_UNREACHABLE()` with `assert(0)`.

--

___
Python tracker 
<https://bugs.python.org/issue37812>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38205] Python no longer compiles without small integer singletons

2019-09-18 Thread Ma Lin


Ma Lin  added the comment:

If use static inline function, and Py_UNREACHABLE() inside an if-else branch 
that should return a value, compiler may emit warning:
https://godbolt.org/z/YtcNSf

MSVC v19.14:
warning C4715: 'test': not all control paths return a value

clang 8.0.0:
warning: control may reach end of non-void function [-Wreturn-type]

Other compilers (gcc, icc) don't emit this warning.

This situation in real code:
https://github.com/python/cpython/blob/v3.8.0b4/Include/object.h#L600
https://github.com/python/cpython/blob/v3.8.0b4/Objects/longobject.c#L3088

--

___
Python tracker 
<https://bugs.python.org/issue38205>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38205] Python no longer compiles without small integer singletons

2019-09-18 Thread Ma Lin


Change by Ma Lin :


--
keywords: +patch
pull_requests: +15860
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/16270

___
Python tracker 
<https://bugs.python.org/issue38205>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38205] Python no longer compiles without small integer singletons

2019-09-18 Thread Ma Lin


Ma Lin  added the comment:

PR 16270 use Py_UNREACHABLE() in a single line.
It solves this particular issue.

--

___
Python tracker 
<https://bugs.python.org/issue38205>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37812] Make implicit returns explicit in longobject.c (in CHECK_SMALL_INT)

2019-09-19 Thread Ma Lin

Ma Lin  added the comment:

Recent commits for longobject.c

Revision: 5e63ab05f114987478a21612d918a1c0276fe9d2
Author: Greg Price 
Date: 19-8-25 1:19:37
Message:
bpo-37812: Convert CHECK_SMALL_INT macro to a function so the return is 
explicit. (GH-15216)

The concern for this issue is: implicit return from macro.
We can add a comment before the call sites of CHECK_SMALL_INT macro, to explain 
that there is a possible return.

Revision: 6b519985d23bd0f0bd072b5d5d5f2c60a81a19f2
Author: animalize 
Date: 19-9-6 14:00:56
Message:
replace inline function `is_small_int` with a macro version (GH-15710)

Then this commit is not necessary.

Revision: c6734ee7c55add5fdc2c821729ed5f67e237a096
Author: Sergey Fedoseev 
Date: 19-9-12 22:41:14
Message:
bpo-37802: Slightly improve perfomance of PyLong_FromUnsigned*() (GH-15192)

This commit introduced a compiler warning due to this line [1]:
d:\dev\cpython\objects\longobject.c(412): warning C4244: “function”: from 
“unsigned long” to “sdigit ”,may lose data

[1] the line:
return get_small_int((ival)); \
https://github.com/python/cpython/blob/master/Objects/longobject.c#L386

Revision: 42acb7b8d29d078bc97b0cfd7c4911b2266b26b9
Author: HongWeipeng <961365...@qq.com>
Date: 19-9-18 23:10:15
Message:
bpo-35696: Simplify long_compare() (GH-16146)

IMO this commit reduces readability a bit.

We can sort out these problems.

--

___
Python tracker 
<https://bugs.python.org/issue37812>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35696] remove unnecessary operation in long_compare()

2019-09-20 Thread Ma Lin

Ma Lin  added the comment:

> I'd fix them, but I'm not sure if we are going to restore CHECK_SMALL_INT() 
> ¯\_(ツ)_/¯

I suggest we slow down, carefully sort out the recent commits for longobject.c:
https://bugs.python.org/issue37812#msg352837

Make the code has consistent style, better readability...

--

___
Python tracker 
<https://bugs.python.org/issue35696>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38252] micro-optimize ucs1lib_find_max_char in Windows 64-bit build

2019-09-22 Thread Ma Lin


New submission from Ma Lin :

C type `long` is 4-byte integer in 64-bit Windows build. [1]

But `ucs1lib_find_max_char()` function [2] uses SIZEOF_LONG, so it loses a 
little performance in 64-bit Windows build.

Below is the benchmark of using SIZEOF_SIZE_T and this change:

-   unsigned long value = *(unsigned long *) _p;
+   sizt_t value = *(sizt_t *) _p;

D:\dev\cpython\PCbuild\amd64\python.exe -m pyperf timeit -s "b=b'a'*10_000_000; 
f=b.decode;" "f('latin1')"

before: 5.83 ms +- 0.05 ms
after : 5.58 ms +- 0.06 ms

[1] https://stackoverflow.com/questions/384502

[2] 
https://github.com/python/cpython/blob/v3.8.0b4/Objects/stringlib/find_max_char.h#L9

Maybe there can be more optimizations, so I didn't prepare a PR for this.

--
components: Interpreter Core
messages: 352970
nosy: Ma Lin, inada.naoki, serhiy.storchaka, sir-sigurd
priority: normal
severity: normal
status: open
title: micro-optimize ucs1lib_find_max_char in Windows 64-bit build
type: performance
versions: Python 3.9

___
Python tracker 
<https://bugs.python.org/issue38252>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38252] micro-optimize ucs1lib_find_max_char in Windows 64-bit build

2019-09-23 Thread Ma Lin


Ma Lin  added the comment:

Maybe @sir-sigurd can find more optimizations.

FYI, `_Py_bytes_isascii()` function [1] also has similar code.
[1] https://github.com/python/cpython/blob/v3.8.0b4/Objects/bytes_methods.c#L104

--

___
Python tracker 
<https://bugs.python.org/issue38252>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38252] micro-optimize ucs1lib_find_max_char in Windows 64-bit build

2019-09-23 Thread Ma Lin


Change by Ma Lin :


--
keywords: +patch
pull_requests: +15911
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/16334

___
Python tracker 
<https://bugs.python.org/issue38252>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38252] Use 8-byte step to detect ASCII sequence in 64bit Windows builds

2019-09-23 Thread Ma Lin


Change by Ma Lin :


--
title: micro-optimize ucs1lib_find_max_char in Windows 64-bit build -> Use 
8-byte step to detect ASCII sequence in 64bit Windows builds

___
Python tracker 
<https://bugs.python.org/issue38252>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38252] Use 8-byte step to detect ASCII sequence in 64bit Windows builds

2019-09-23 Thread Ma Lin


Ma Lin  added the comment:

There are 4 functions have the similar code, see PR 16334.
Just replaced the `unsigned long` type with `size_t` type, got these benchmarks.
Can this be backported to 3.8 branch?

1.  bytes.isascii()

D:\dev\cpython\PCbuild\amd64\python.exe -m pyperf timeit -s "b = b'x' * 
100_000_000; f = b.isascii;" "f()"

+---+---+--+
| Benchmark | isascii_a | isascii_b|
+===+===+==+
| timeit| 11.7 ms   | 7.84 ms: 1.50x faster (-33%) |
+---+---+--+

2.  bytes.decode('latin1')

D:\dev\cpython\PCbuild\amd64\python.exe -m pyperf timeit -s "b = b'x' * 
100_000_000; f = b.decode;" "f('latin1')"

+---+--+-+
| Benchmark | latin1_a | latin1_b|
+===+==+=+
| timeit| 60.3 ms  | 57.4 ms: 1.05x faster (-5%) |
+---+--+-+

3.  bytes.decode('ascii')

D:\dev\cpython\PCbuild\amd64\python.exe -m pyperf timeit -s "b = b'x' * 
100_000_000; f = b.decode;" "f('ascii')"

+---+-+-+
| Benchmark | ascii_a | ascii_b |
+===+=+=+
| timeit| 48.5 ms | 47.1 ms: 1.03x faster (-3%) |
+---+-+-+

4.  bytes.decode('utf8')

D:\dev\cpython\PCbuild\amd64\python.exe -m pyperf timeit -s "b = b'x' * 
100_000_000; f = b.decode;" "f('utf8')"

+---+-+-+
| Benchmark | utf8_a  | utf8_b  |
+===+=+=+
| timeit| 48.3 ms | 47.1 ms: 1.03x faster (-3%) |
+---+-+-+

--

___
Python tracker 
<https://bugs.python.org/issue38252>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38321] Compiler warnings when building Python 3.8

2019-09-30 Thread Ma Lin


Ma Lin  added the comment:

On my Windows, some non-ASCII characters cause this warning:

d:\dev\cpython\modules\expat\xmltok.c : warning C4819: 
The file contains a character that cannot be represented in
the current code page (936). Save the file in Unicode format
to prevent data loss.

This patch fixes the warnings, it's applicable to master/3.8 branches.
https://github.com/animalize/cpython/commit/daced7575ec70ef1f888c6854760e230cda5ea64

Maybe this trivial problem is not worth a new commit, it can be fixed along 
with other warnings.

--
nosy: +Ma Lin

___
Python tracker 
<https://bugs.python.org/issue38321>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38321] Compiler warnings when building Python 3.8

2019-09-30 Thread Ma Lin


Ma Lin  added the comment:

Other warnings:

c:\vstinner\python\master\objects\longobject.c(420): warning C4244: 'function': 
conversion from 'unsigned __int64' to 'sdigit', possible loss of data

c:\vstinner\python\master\objects\longobject.c(428): warning C4267: 'function': 
conversion from 'size_t' to 'sdigit', possible loss of data
-
These warnings only appear in master branch, I will fix it at some point.
(https://bugs.python.org/issue35696#msg352903)

--

___
Python tracker 
<https://bugs.python.org/issue38321>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38321] Compiler warnings when building Python 3.8

2019-10-01 Thread Ma Lin


Ma Lin  added the comment:

> This file is copied directly from https://github.com/libexpat/libexpat/ > 
> project. Would you mind to propose your patch there?

ok, I will report to there.

--

___
Python tracker 
<https://bugs.python.org/issue38321>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13153] IDLE 3.x on Windows exits when pasting non-BMP unicode

2019-10-03 Thread Ma Lin


Ma Lin  added the comment:

> Thus this breaks editing the physical line past the astral character. We 
> cannot do anything with this.

I tried, it's sad the experience is not very good.

------
nosy: +Ma Lin

___
Python tracker 
<https://bugs.python.org/issue13153>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38056] Overhaul Error Handlers section in codecs documentation

2019-10-12 Thread Ma Lin


Ma Lin  added the comment:

PR 15732 became an overhaul:

- replace/backslashreplace/surrogateescape were wrongly described as encoding 
only, in fact they can also be used in decoding.
- clarify the description of surrogatepass.
- add more descriptions to each handler.
- add two REPL examples.
- add indexes for Error Handler's name.
- add default parameter values in codecs.rst
- improve term "text encoding".

PR 15732 has a screenshot of the Error Handlers section.

--
components: +Unicode
nosy: +ezio.melotti, vstinner
title: Add examples for common text encoding Error Handlers -> Overhaul Error 
Handlers section in codecs documentation

___
Python tracker 
<https://bugs.python.org/issue38056>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38582] Regular match overflow

2019-10-24 Thread Ma Lin


Change by Ma Lin :


--
nosy: +Ma Lin
type: security -> 

___
Python tracker 
<https://bugs.python.org/issue38582>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38582] Regular match overflow

2019-10-24 Thread Ma Lin


Ma Lin  added the comment:

An simpler reproduce code:

```
import re

NUM = 99

# items = [ '(001)', '(002)', '(003)', ..., '(NUM)']
items = [r'(%03d)' % i for i in range(1, 1+NUM)]
pattern = '|'.join(items)

# repl = '\1\2\3...\NUM'
temp = ('\\' + str(i) for i in range(1, 1+NUM))
repl = ''.join(temp)

text = re.sub(pattern, repl, '(001)')
print(text)

# if NUM == 99
# output: (001)
# if NUM == 100
# output: (001@)
# if NUM == 101
# output: (001@A)
```

--
components: +Regular Expressions
nosy: +ezio.melotti, mrabarnett

___
Python tracker 
<https://bugs.python.org/issue38582>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38582] re: backreference number in replace string can't >= 100

2019-10-24 Thread Ma Lin


Ma Lin  added the comment:

Backreference number in replace string can't >= 100
https://github.com/python/cpython/blob/v3.8.0/Lib/sre_parse.py#L1022-L1036

If none take this, I will try to fix this issue tomorrow.

--
nosy: +serhiy.storchaka
title: Regular match overflow -> re: backreference number in replace string 
can't >= 100
versions: +Python 3.7, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue38582>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38582] re: backreference number in replace string can't >= 100

2019-10-24 Thread Ma Lin


Ma Lin  added the comment:

@veaba 
Post only in English is fine.

> Is this actually needed?
Maybe very very few people dynamically generate some large patterns.

> However, \g<...> is not accepted in a pattern.
> in the "regex" module I added support for it in a pattern too.
Yes, backreference number in pattern also can't >= 100
Support \g<...> in pattern is a good idea.

If fix this issue, may produce backward compatibility issue: the parser will 
confuse backreference numbers and octal escape numbers.
Maybe can clarify the limit (<=99) in the document is enough.

--

___
Python tracker 
<https://bugs.python.org/issue38582>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38582] re: backreference number in replace string can't >= 100

2019-10-25 Thread Ma Lin


Ma Lin  added the comment:

Octal escape:
\oooCharacter with octal value ooo
As in Standard C, up to three octal digits are accepted.

It only accepts UCS1 characters (ooo <= 0o377):
>>> ord('\377')
255
>>> len('\378')
2
>>> '\378' == '\37' + '8'
True

IMHO this is not useful, and creates confusions.
Maybe it can be deprecated in language level.

--

___
Python tracker 
<https://bugs.python.org/issue38582>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38582] re: backreference number in replace string can't >= 100

2019-10-25 Thread Ma Lin


Ma Lin  added the comment:

> I'd still retain \0 as a special case, since it really is useful.

Yes, maybe \0 is used widely, I didn't think of it.
Changing is troublesome, let's keep it as is.

--

___
Python tracker 
<https://bugs.python.org/issue38582>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23692] Undocumented feature prevents re module from finding certain matches

2019-10-27 Thread Ma Lin


Change by Ma Lin :


--
nosy: +Ma Lin

___
Python tracker 
<https://bugs.python.org/issue23692>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37527] Timestamp conversion on windows fails with timestamps close to EPOCH

2019-11-01 Thread Ma Lin


Ma Lin  added the comment:

issue29097 fixed bug in `datetime.fromtimestamp()`.
But this issue is about `datetime.timestamp()`, not fixed yet.

--

___
Python tracker 
<https://bugs.python.org/issue37527>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-05 Thread Ma Lin


Ma Lin  added the comment:

ping

--

___
Python tracker 
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43785] bz2 performance issue.

2021-04-09 Thread Ma Lin


Change by Ma Lin :


--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue43785>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43785] Remove RLock from BZ2File

2021-04-09 Thread Ma Lin


Ma Lin  added the comment:

This change is backwards incompatible, it may break some code silently.

If someone really needs better performance, they can write a BZ2File class 
without RLock by themselves, it should be easy.

FYI, zlib module was added in 1997, bz2 module was added in 2002, lzma module 
was added in 2011. (Just curious for these years)

--

___
Python tracker 
<https://bugs.python.org/issue43785>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-11 Thread Ma Lin


Ma Lin  added the comment:

> I don't really _like_ that this is a .h file acting as a C template to inject
> effectively the same static code into each module that wants to use it...
> Which I think is the concern Victor is expressing in a comment above.

I think so too.

The defines of BOB_BUFFER_TYPE/BOB_SIZE_TYPE/BOB_SIZE_MAX are ugly. If put the 
core code together, these defines can be put in a thin wrapper in 
_bz2module.c/_lzmamodule.c/zlibmodule.c files. This can be done now, but it's 
ideal to improve it more thoroughly in 3.11.

_PyBytesWriter has different behavior, user may access existing data as plain 
data, which is impossible for _BlocksOutputBuffer. An API/code can be carefully 
designed, efficient/flexible/elegant, then the code may be used in some sites 
in CPython.

--

___
Python tracker 
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43787] Optimize BZ2File, GzipFile, and LZMAFile __iter__ method.

2021-04-12 Thread Ma Lin


Change by Ma Lin :


--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue43787>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43787] Optimize BZ2File, GzipFile, and LZMAFile __iter__ method.

2021-04-12 Thread Ma Lin


Ma Lin  added the comment:

I think this change is safe.

The behaviors should be exactly the same, except the iterators are different 
objects (obj vs obj._buffer).

--

___
Python tracker 
<https://bugs.python.org/issue43787>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-24 Thread Ma Lin


Ma Lin  added the comment:

> The defines of BOB_BUFFER_TYPE/BOB_SIZE_TYPE/BOB_SIZE_MAX are ugly. If put 
> the core code together, these defines can be put in a thin wrapper in 
> _bz2module.c/_lzmamodule.c/zlibmodule.c files.

I tried, it looks well.
I will updated the PR within one or two days.
The code is more concise, and the burden of review is not big.

--

___
Python tracker 
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-26 Thread Ma Lin


Ma Lin  added the comment:

Very sorry for update at the last moment.
But after the update, we should no need to touch it in the future, so I think 
it's worthy. 

Please review the last commit in PR 21740, the previous commits have not been 
changed.
IMO if use a Git client such as TortoiseGit, reviewing may be more convenient. 

The changes:

1, Move `Modules/blocks_output_buffer.h` to 
`Include/internal/pycore_blocks_output_buffer.h`
Keep the `Modules` folder clean.

2, Ask the user to initialize the struct instance like this, and use assertions 
to check it:
_BlocksOutputBuffer buffer = {.list = NULL};

Then no longer worry about whether buffer.list is uninitialized in error 
handling.
There is an extra assignment, but it's beneficial to long-term code maintenance.

3, Change the type of BUFFER_BLOCK_SIZE from `int` to `Py_ssize_t`.
The core code can remove a few type casts.

4, These functions return allocated size on success, return -1 on failure:
_BlocksOutputBuffer_Init()
_BlocksOutputBuffer_InitAndGrow()
_BlocksOutputBuffer_InitWithSize()
_BlocksOutputBuffer_Grow()
If the code is used in other sites, this API is simpler.

5, All functions are decorated with `inline`.
If the compiler is smart enough, it's possible to eliminate some code when 
`max_length` is constant and < 0.

--

___
Python tracker 
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41735] Thread locks in zlib module may go wrong in rare case

2021-04-27 Thread Ma Lin


Ma Lin  added the comment:

Thanks for review.

--

___
Python tracker 
<https://bugs.python.org/issue41735>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-27 Thread Ma Lin

Ma Lin  added the comment:

The above changes were made in this commit:

split core code and wrappers
55705f6dc28ff4dc6183e0eb57312c885d19090a

After that commit, there is a new commit, it resolves the code conflicts 
introduced by PR 22126 one hour ago.

Merge branch 'master' into blocks_output_buffer
45d752649925765b1b3cf39e9045270e92082164

Sorry to complicate the review again.
I should ask Łukasz Langa to merge PR 22126 after this issue is resolved, since 
resolving code conflicts in PR 22126 is easier.

For the change from 55705f6 to 45d7526, see the uploaded file (45d7526.diff), 
it can also be easily seen with a Git client.

--
Added file: https://bugs.python.org/file49993/45d7526.diff

___
Python tracker 
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-28 Thread Ma Lin


Ma Lin  added the comment:

Thanks for reviewing this big patch.
Your review makes the code better.

--

___
Python tracker 
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-29 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +24429
pull_request: https://github.com/python/cpython/pull/25738

___
Python tracker 
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-29 Thread Ma Lin


Ma Lin  added the comment:

Found a backward incompatible behavior. 

Before the patch, in 64-bit build, zlib module allows the initial size > 
UINT32_MAX.
It creates a bytes object, and uses a sliding window to deal with the 
UINT32_MAX limit:
https://github.com/python/cpython/blob/v3.9.4/Modules/zlibmodule.c#L183

After the patch, when init_size > UINT32_MAX, it raises a ValueError.

PR 25738 fixes this backward incompatibility.
If the initial size > UINT32_MAX, it clamps to UINT32_MAX, rather than raising 
an exception.

Moreover, if you don't mind, I would like to take this opportunity to rename 
the wrapper functions from Buffer_* to OutputBuffer_*, so that the readers can 
easily distinguish between input buffer and output buffer.
If you don't think it's necessary, you may merge PR 25738 as is.

--

___
Python tracker 
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33376] [pysqlite] Duplicate rows can be returned after rolling back a transaction

2021-05-10 Thread Ma Lin


Ma Lin  added the comment:

Erlend, please take a look at this bug.

--

___
Python tracker 
<https://bugs.python.org/issue33376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44114] Incorrect function signatures in dictobject.c

2021-05-12 Thread Ma Lin


Change by Ma Lin :


--
nosy: +methane

___
Python tracker 
<https://bugs.python.org/issue44114>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-05-15 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +24779
pull_request: https://github.com/python/cpython/pull/26143

___
Python tracker 
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-05-15 Thread Ma Lin


Ma Lin  added the comment:

Sorry, for the (init_size > UINT32_MAX) problem, I have a better solution.

Please imagine this scenario:
- before the patch
- in 64-bit build
- use zlib.decompress() function
- the exact decompressed size is known and > UINT32_MAX (e.g. 10 GiB)

If set the `bufsize` argument to the decompressed size, it used to have a fast 
path:

zlib.decompress(data, bufsize=10*1024*1024*1024)

Fast path when (the initial size == the actual size):
https://github.com/python/cpython/blob/v3.9.5/Modules/zlibmodule.c#L424-L426

https://github.com/python/cpython/blob/v3.9.5/Objects/bytesobject.c#L3008-L3011

But in the current code, the initial size is clamped to UINT32_MAX, so there 
are two regressions:

1. allocate double RAM. (~20 GiB, blocks and the final bytes)
2. need to memcpy from blocks to the final bytes.

PR 26143 uses an UINT32_MAX sliding window for the first block, now the initial 
buffer size can be greater than UINT32_MAX.

_BlocksOutputBuffer_Finish() already has a fast path for single block. 
Benchmark this code:

zlib.decompress(data, bufsize=10*1024*1024*1024)

  time  RAM
before: 7.92 sec, ~20 GiB
after:  6.61 sec,  10 GiB
(AMD 3600X, DDR4-3200, decompressed data is 10_GiB * b'a')

Maybe some user code rely on this corner case.
This should be the last revision, then there is no regression in any case.

--

___
Python tracker 
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43650] MemoryError on zip.read in shutil._unpack_zipfile

2021-05-15 Thread Ma Lin


Change by Ma Lin :


--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue43650>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44134] lzma: stream padding in xz files

2021-05-15 Thread Ma Lin


Change by Ma Lin :


--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue44134>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44439] PickleBuffer doesn't have __len__ method

2021-06-16 Thread Ma Lin


New submission from Ma Lin :

If run this code, it will raise an exception: 

import pickle
import lzma
import pandas as pd
with lzma.open("test.xz", "wb") as file:
pickle.dump(pd.DataFrame(range(1_000_000)), file, protocol=5)

The exception:

Traceback (most recent call last):
  File "E:\testlen.py", line 7, in 
pickle.dump(pd.DataFrame(range(1_000_000)), file, protocol=5)
  File "D:\Python39\lib\lzma.py", line 234, in write
self._pos += len(data)
TypeError: object of type 'pickle.PickleBuffer' has no len()

The exception is raised in lzma.LZMAFile.write() method:
https://github.com/python/cpython/blob/v3.10.0b2/Lib/lzma.py#L238

PickleBuffer doesn't have .__len__ method, is it intended?

--
messages: 395971
nosy: malin, pitrou
priority: normal
severity: normal
status: open
title: PickleBuffer doesn't have __len__ method

___
Python tracker 
<https://bugs.python.org/issue44439>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44439] PickleBuffer doesn't have __len__ method

2021-06-16 Thread Ma Lin


Ma Lin  added the comment:

Ok, I'm working on a PR.

--

___
Python tracker 
<https://bugs.python.org/issue44439>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44439] PickleBuffer doesn't have __len__ method

2021-06-17 Thread Ma Lin


Change by Ma Lin :


--
keywords: +patch
pull_requests: +25350
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/26764

___
Python tracker 
<https://bugs.python.org/issue44439>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44439] stdlib wrongly uses len() for bytes-like object

2021-06-21 Thread Ma Lin


Ma Lin  added the comment:

I am checking all the .py files in `Lib` folder.
hmac.py has two len() bugs:
https://github.com/python/cpython/blob/v3.10.0b3/Lib/hmac.py#L212
https://github.com/python/cpython/blob/v3.10.0b3/Lib/hmac.py#L214

I think PR 26764 is prepared, it fixes the len() bugs in bz2.py/lzma.py files.

--
nosy: +christian.heimes
title: PickleBuffer doesn't have __len__ method -> stdlib wrongly uses len() 
for bytes-like object

___
Python tracker 
<https://bugs.python.org/issue44439>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44458] Duplicate symbol _BUFFER_BLOCK_SIZE when statically linking multiple modules

2021-06-21 Thread Ma Lin


Change by Ma Lin :


--
nosy: +malin

___
Python tracker 
<https://bugs.python.org/issue44458>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



  1   2   3   4   >