Ma Lin added the comment:
These can be done in .__new__() method:
- create thread lock
- create (de)?compression context
- initialize (de)?compressor states
In .__init__() method, only set (de)?compression parameters. And prevent
.__init__() method from being called multiple times.
This
New submission from Ma Lin :
These methods are METH_NOARGS, in all cases the second parameter will be NULL.
{"_checkClosed", _PyIOBase_check_closed, METH_NOARGS},
{"_checkSeekable", _PyIOBase_check_seekable, METH_NOARGS},
{"_checkReadable", _PyIOBa
Change by Ma Lin :
--
keywords: +patch
pull_requests: +28606
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/30397
___
Python tracker
<https://bugs.python.org/issu
Change by Ma Lin :
--
resolution: -> not a bug
stage: patch review -> resolved
status: open -> closed
___
Python tracker
<https://bugs.python.or
New submission from Ma Lin :
Since CPython 3.0.0, the checksums are always truncated to `unsigned int`:
https://github.com/python/cpython/blob/v3.0/Modules/zlibmodule.c#L930
https://github.com/python/cpython/blob/v3.0/Modules/zlibmodule.c#L950
--
assignee: docs@python
components
Change by Ma Lin :
--
keywords: +patch
pull_requests: +30046
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/31955
___
Python tracker
<https://bugs.python.org/issu
Ma Lin added the comment:
`binascii.crc32` doc also has this invalid document:
doc: https://docs.python.org/3/library/binascii.html#binascii.crc32
3.0.0 code: https://github.com/python/cpython/blob/v3.0/Modules/binascii.c#L1035
In addition, `binascii.crc32` has an `USE_ZLIB_CRC32` code path
Ma Lin added the comment:
`_Stream.write` method in tarfile.py also has this code:
https://github.com/python/cpython/blob/v3.11.0a6/Lib/tarfile.py#L434
But this bug will not be triggered. When calling this method, always pass bytes
data.
`_ConnectionBase.send_bytes` method in
Change by Ma Lin :
--
pull_requests: +30090
pull_request: https://github.com/python/cpython/pull/32002
___
Python tracker
<https://bugs.python.org/issue47
Ma Lin added the comment:
PR 32002 is for 3.10/3.9 branches.
--
___
Python tracker
<https://bugs.python.org/issue47040>
___
___
Python-bugs-list mailin
Ma Lin added the comment:
If run this code, would it be slower?
bytes_hash = hash(bytes_data)
bytes_hash = hash(bytes_data) # get hash twice
--
nosy: +malin
___
Python tracker
<https://bugs.python.org/issue46
Ma Lin added the comment:
Since hash() is a public function, maybe some users use hash value to manage
bytes objects in their own way, then there may be a performance regression.
For a rough example, dispatch data to 16 servers.
h = hash(b)
sendto(server_number=h & 0xF, da
Ma Lin added the comment:
RAM is now relatively cheaper than CPU.
1 million bytes object additionally use 7.629 MiB RAM for ob_shash.
(100_*8/1024/1024).
This causes hash() performance regression anyway.
--
___
Python tracker
<ht
Ma Lin added the comment:
If put a bytes object into multiple dicts/sets, the hash need to be computed
multiple times. This seems a common usage.
bytes is a very basic type, users may use it in various ways. And unskilled
users may checking the same bytes object against dicts/sets many
Ma Lin added the comment:
> I posted remove-bytes-hash.patch in this issue. Would you measure how this
> affects whole application performance rather than micro benchmarks?
I guess not much difference in benchmarks.
But if put a bytes object into multiple dicts/sets, and len(bytes_k
Ma Lin added the comment:
Thanks for your review.
3.11 has a more powerful re module, also thank you for rebasing the atomic
grouping code.
--
___
Python tracker
<https://bugs.python.org/issue35
Ma Lin added the comment:
My PR methods are suboptimal, so I closed them.
The number of REPEAT can be counted when compiling a pattern, and allocate a
`SRE_REPEAT` array in `SRE_STATE` (with that number items).
It seem at any time, a REPEAT will only have one in active, so a `SRE_REPEAT
Ma Lin added the comment:
Please don't merge too close to the 3.11 beta1 release date, I'll submit PRs
after this merged.
--
___
Python tracker
<https://bugs.python.o
Change by Ma Lin :
--
pull_requests: +30265
pull_request: https://github.com/python/cpython/pull/32188
___
Python tracker
<https://bugs.python.org/issue23
Change by Ma Lin :
--
pull_requests: +30266
pull_request: https://github.com/python/cpython/pull/32188
___
Python tracker
<https://bugs.python.org/issue47
Change by Ma Lin :
--
pull_requests: +30298
pull_request: https://github.com/python/cpython/pull/32223
___
Python tracker
<https://bugs.python.org/issue23
New submission from Ma Lin :
`bytes(m)` can be replaced by memoryview.cast('B'), then no need for data
copying.
m = memoryview(buf)
# HACK for byte-indexing of non-bytewise buffers (e.g. array.array)
if m.itemsize > 1:
m = memoryview(bytes(
Change by Ma Lin :
--
keywords: +patch
pull_requests: +30318
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/32247
___
Python tracker
<https://bugs.python.org/issu
Ma Lin added the comment:
In `Modules` folder, there are _sre.c/sre.h/sre_constants.h/sre_lib.h files.
Will them be put into a folder?
--
___
Python tracker
<https://bugs.python.org/issue47
Change by Ma Lin :
--
pull_requests: +30344
pull_request: https://github.com/python/cpython/pull/32283
___
Python tracker
<https://bugs.python.org/issue23
Ma Lin added the comment:
Match.regs is an undocumented attribute, it seems it has existed since 1991.
Can it be removed?
https://github.com/python/cpython/blob/ff2cf1d7d5fb25224f3ff2e0c678d36f78e1f3cb/Modules/_sre/sre.c#L2871
--
___
Python
Ma Lin added the comment:
> cryptic name
In very early versions, "mark" was called register/region.
https://github.com/python/cpython/blob/v1.0.1/Modules/regexpr.h#L48-L52
If span is accessed repeatedly, it's faster than Match.span().
Maybe consider renaming it, and
Change by Ma Lin :
--
nosy: +malin
___
Python tracker
<https://bugs.python.org/issue47211>
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.pyth
Ma Lin added the comment:
Could you give the two versions? I will do a git bisect.
I tested 356997c~1 and 356997c [1], msvc2022 non-pgo release build:
# regex_dna ###
an +- std dev: 151 ms +- 1 ms -> 152 ms +- 1 ms: 1.01x slower
t significant
# regex_effbot ###
an +- std dev: 2.47
New submission from Ma Lin :
These changes reduce sizeof(match_context):
- 32-bit build: 36 bytes, no change.
- 64-bit build: 72 bytes -> 56 bytes.
sre uses stack and `match_context` struct to simulate recursive call, smaller
struct brings:
- deeper recursive call
- less memory cons
Change by Ma Lin :
--
keywords: +patch
pull_requests: +30437
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/32411
___
Python tracker
<https://bugs.python.org/issu
Ma Lin added the comment:
> Possibly related to the new atomic grouping support from GH-31982?
It seems not likely.
I will do some benchmarks for this issue, more information (version/platform)
is welcome.
--
___
Python tracker
<
Change by Ma Lin :
--
nosy: +Ma Lin
___
Python tracker
<https://bugs.python.org/issue37907>
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.pyth
New submission from Ma Lin :
Commit 5e63ab0 replaces macro with this inline function:
static inline int
is_small_int(long long ival)
{
return -NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS;
}
(by default, NSMALLNEGINTS is 5, NSMALLPOSINTS is 257)
Ma Lin added the comment:
There will always be a new commit, replacing with a macro version also looks
good.
I have no opinion, both are fine.
--
___
Python tracker
<https://bugs.python.org/issue38
New submission from Ma Lin :
Adding these two lines to /Objects/longobject.c will disable the "preallocated
small integer pool":
#define NSMALLPOSINTS 0
#define NSMALLNEGINTS 0
Then run this reproduce code (attached):
from enum import IntEnum
import _signal
Change by Ma Lin :
--
keywords: +patch
pull_requests: +15355
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/15701
___
Python tracker
<https://bugs.python.org/issu
Ma Lin added the comment:
I did a Git bisect, this is the first bad commit:
https://github.com/python/cpython/commit/9541bd321a94f13dc41163a5d7a1a847816fac84
nosy involved mates.
--
nosy: +berker.peksag, nanjekyejoannah
___
Python tracker
<ht
Change by Ma Lin :
--
keywords: +patch
pull_requests: +15365
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/15710
___
Python tracker
<https://bugs.python.org/issu
Ma Lin added the comment:
Revert commit 5e63ab0 or use PR 15710, both are fine.
--
___
Python tracker
<https://bugs.python.org/issue38015>
___
___
Python-bug
Ma Lin added the comment:
This range has not been changed since "preallocated small integer pool" was
introduced:
#define NSMALLPOSINTS 257
#define NSMALLNEGINTS 5
The commit (Jan 2007):
https://github.com/python/cpython/commit/ddefaf31b366ea84250fc5090837c2b764a04102
I
Change by Ma Lin :
--
title: Assertion failed: object has negative ref count -> reference counter
issue in signal module
___
Python tracker
<https://bugs.python.org/issu
Change by Ma Lin :
--
nosy: +Ma Lin
___
Python tracker
<https://bugs.python.org/issue26868>
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.pyth
Ma Lin added the comment:
> This change produces tiny, but measurable speed-up for handling small ints
I didn't get measurable change, I run this command a dozen times and take the
best result:
D:\dev\cpython\PCbuild\amd64\python.exe -m pyperf timeit -s "from collections
New submission from Ma Lin :
Text descriptions about `Error Handlers` are not very friendly to novices.
https://docs.python.org/3/library/codecs.html#error-handlers
For example:
'xmlcharrefreplace'
Replace with the appropriate XML character reference (only for encoding).
I
Change by Ma Lin :
--
keywords: +patch
pull_requests: +15386
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/15732
___
Python tracker
<https://bugs.python.org/issu
Change by Ma Lin :
--
pull_requests: +15407
pull_request: https://github.com/python/cpython/pull/15753
___
Python tracker
<https://bugs.python.org/issue38
Ma Lin added the comment:
PR 15710 has been merged into the master, but the merge message is not shown
here.
Commit:
https://github.com/python/cpython/commit/6b519985d23bd0f0bd072b5d5d5f2c60a81a19f2
Maybe this issue can be closed.
--
resolution: -> fixed
stage: patch rev
Ma Lin added the comment:
Some memos:
1, In liblzma, these missing bytes were copied inside `dict_repeat` function:
788 case SEQ_COPY:
789 // Repeat len bytes from distance of rep0.
790 if (unlikely(dict_repeat(&dict, rep0, &len))) {
See l
Ma Lin added the comment:
This commit changed Py_UNREACHABLE() five days ago:
https://github.com/python/cpython/commit/3ab61473ba7f3dca32d779ec2766a4faa0657923
If remove this change, it can be compiled successfully.
--
nosy: +Ma Lin
___
Python
Ma Lin added the comment:
We can change Py_UNREACHABLE() to assert(0) in longobject.c
Or remove the article in Py_UNREACHABLE()
--
___
Python tracker
<https://bugs.python.org/issue38
Ma Lin added the comment:
> It's not clear to me if anyone benchmarked to see if the
> conversion to a macro had any measurable performance benefit.
I tested on that day, also use this command:
python.exe -m pyperf timeit -s "from collections import deque; consume =
deque(
Ma Lin added the comment:
> I agree that both changes should be reverted.
There is another commit after the two commits:
https://github.com/python/cpython/commit/c6734ee7c55add5fdc2c821729ed5f67e237a096
It is troublesome to revert them.
PR 16146 is on-going, maybe we can request the aut
Ma Lin added the comment:
If use static inline function, and Py_UNREACHABLE() inside an if-else branch
that should return a value, compiler may emit warning:
https://godbolt.org/z/YtcNSf
MSVC v19.14:
warning C4715: 'test': not all control paths return a value
c
Change by Ma Lin :
--
keywords: +patch
pull_requests: +15860
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/16270
___
Python tracker
<https://bugs.python.org/issu
Ma Lin added the comment:
PR 16270 use Py_UNREACHABLE() in a single line.
It solves this particular issue.
--
___
Python tracker
<https://bugs.python.org/issue38
Ma Lin added the comment:
Recent commits for longobject.c
Revision: 5e63ab05f114987478a21612d918a1c0276fe9d2
Author: Greg Price
Date: 19-8-25 1:19:37
Message:
bpo-37812: Convert CHECK_SMALL_INT macro to a function so the return is
explicit. (GH-15216)
The concern for
Ma Lin added the comment:
> I'd fix them, but I'm not sure if we are going to restore CHECK_SMALL_INT()
> ¯\_(ツ)_/¯
I suggest we slow down, carefully sort out the recent commits for longobject.c:
https://bugs.python.org/issue37812#msg352837
Make the code has consiste
New submission from Ma Lin :
C type `long` is 4-byte integer in 64-bit Windows build. [1]
But `ucs1lib_find_max_char()` function [2] uses SIZEOF_LONG, so it loses a
little performance in 64-bit Windows build.
Below is the benchmark of using SIZEOF_SIZE_T and this change:
- unsigned
Ma Lin added the comment:
Maybe @sir-sigurd can find more optimizations.
FYI, `_Py_bytes_isascii()` function [1] also has similar code.
[1] https://github.com/python/cpython/blob/v3.8.0b4/Objects/bytes_methods.c#L104
--
___
Python tracker
<ht
Change by Ma Lin :
--
keywords: +patch
pull_requests: +15911
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/16334
___
Python tracker
<https://bugs.python.org/issu
Change by Ma Lin :
--
title: micro-optimize ucs1lib_find_max_char in Windows 64-bit build -> Use
8-byte step to detect ASCII sequence in 64bit Windows builds
___
Python tracker
<https://bugs.python.org/issu
Ma Lin added the comment:
There are 4 functions have the similar code, see PR 16334.
Just replaced the `unsigned long` type with `size_t` type, got these benchmarks.
Can this be backported to 3.8 branch?
1. bytes.isascii()
D:\dev\cpython\PCbuild\amd64\python.exe -m pyperf timeit -s "b
Ma Lin added the comment:
On my Windows, some non-ASCII characters cause this warning:
d:\dev\cpython\modules\expat\xmltok.c : warning C4819:
The file contains a character that cannot be represented in
the current code page (936). Save the file in Unicode format
to prevent
Ma Lin added the comment:
Other warnings:
c:\vstinner\python\master\objects\longobject.c(420): warning C4244: 'function':
conversion from 'unsigned __int64' to 'sdigit', possible loss of data
c:\vstinner\python\master\objects\longobject.c(428): warning C4267
Ma Lin added the comment:
> This file is copied directly from https://github.com/libexpat/libexpat/ >
> project. Would you mind to propose your patch there?
ok, I will report to there.
--
___
Python tracker
<https://bugs.python.or
Ma Lin added the comment:
> Thus this breaks editing the physical line past the astral character. We
> cannot do anything with this.
I tried, it's sad the experience is not very good.
------
nosy: +Ma Lin
___
Python tracker
<https://b
Ma Lin added the comment:
PR 15732 became an overhaul:
- replace/backslashreplace/surrogateescape were wrongly described as encoding
only, in fact they can also be used in decoding.
- clarify the description of surrogatepass.
- add more descriptions to each handler.
- add two REPL examples
Change by Ma Lin :
--
nosy: +Ma Lin
type: security ->
___
Python tracker
<https://bugs.python.org/issue38582>
___
___
Python-bugs-list mailing list
Unsubscrib
Ma Lin added the comment:
An simpler reproduce code:
```
import re
NUM = 99
# items = [ '(001)', '(002)', '(003)', ..., '(NUM)']
items = [r'(%03d)' % i for i in range(1, 1+NUM)]
pattern = '|'.join(items)
# repl = '\1
Ma Lin added the comment:
Backreference number in replace string can't >= 100
https://github.com/python/cpython/blob/v3.8.0/Lib/sre_parse.py#L1022-L1036
If none take this, I will try to fix this issue tomorrow.
--
nosy: +serhiy.storchaka
title: Regular match overfl
Ma Lin added the comment:
@veaba
Post only in English is fine.
> Is this actually needed?
Maybe very very few people dynamically generate some large patterns.
> However, \g<...> is not accepted in a pattern.
> in the "regex" module I added support for it in a patter
Ma Lin added the comment:
Octal escape:
\oooCharacter with octal value ooo
As in Standard C, up to three octal digits are accepted.
It only accepts UCS1 characters (ooo <= 0o377):
>>> ord('\377')
255
>>> len('\378')
Ma Lin added the comment:
> I'd still retain \0 as a special case, since it really is useful.
Yes, maybe \0 is used widely, I didn't think of it.
Changing is troublesome, let's keep it as is.
--
___
Python tracker
<ht
Change by Ma Lin :
--
nosy: +Ma Lin
___
Python tracker
<https://bugs.python.org/issue23692>
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.pyth
Ma Lin added the comment:
issue29097 fixed bug in `datetime.fromtimestamp()`.
But this issue is about `datetime.timestamp()`, not fixed yet.
--
___
Python tracker
<https://bugs.python.org/issue37
Ma Lin added the comment:
ping
--
___
Python tracker
<https://bugs.python.org/issue41486>
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.pyth
Change by Ma Lin :
--
nosy: +malin
___
Python tracker
<https://bugs.python.org/issue43785>
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.pyth
Ma Lin added the comment:
This change is backwards incompatible, it may break some code silently.
If someone really needs better performance, they can write a BZ2File class
without RLock by themselves, it should be easy.
FYI, zlib module was added in 1997, bz2 module was added in 2002, lzma
Ma Lin added the comment:
> I don't really _like_ that this is a .h file acting as a C template to inject
> effectively the same static code into each module that wants to use it...
> Which I think is the concern Victor is expressing in a comment above.
I think so too.
Change by Ma Lin :
--
nosy: +malin
___
Python tracker
<https://bugs.python.org/issue43787>
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.pyth
Ma Lin added the comment:
I think this change is safe.
The behaviors should be exactly the same, except the iterators are different
objects (obj vs obj._buffer).
--
___
Python tracker
<https://bugs.python.org/issue43
Ma Lin added the comment:
> The defines of BOB_BUFFER_TYPE/BOB_SIZE_TYPE/BOB_SIZE_MAX are ugly. If put
> the core code together, these defines can be put in a thin wrapper in
> _bz2module.c/_lzmamodule.c/zlibmodule.c files.
I tried, it looks well.
I will updated the PR within o
Ma Lin added the comment:
Very sorry for update at the last moment.
But after the update, we should no need to touch it in the future, so I think
it's worthy.
Please review the last commit in PR 21740, the previous commits have not been
changed.
IMO if use a Git client such as Tortoi
Ma Lin added the comment:
Thanks for review.
--
___
Python tracker
<https://bugs.python.org/issue41735>
___
___
Python-bugs-list mailing list
Unsubscribe:
Ma Lin added the comment:
The above changes were made in this commit:
split core code and wrappers
55705f6dc28ff4dc6183e0eb57312c885d19090a
After that commit, there is a new commit, it resolves the code conflicts
introduced by PR 22126 one hour ago.
Merge branch 'm
Ma Lin added the comment:
Thanks for reviewing this big patch.
Your review makes the code better.
--
___
Python tracker
<https://bugs.python.org/issue41
Change by Ma Lin :
--
pull_requests: +24429
pull_request: https://github.com/python/cpython/pull/25738
___
Python tracker
<https://bugs.python.org/issue41
Ma Lin added the comment:
Found a backward incompatible behavior.
Before the patch, in 64-bit build, zlib module allows the initial size >
UINT32_MAX.
It creates a bytes object, and uses a sliding window to deal with the
UINT32_MAX limit:
https://github.com/python/cpython/blob/v3.
Ma Lin added the comment:
Erlend, please take a look at this bug.
--
___
Python tracker
<https://bugs.python.org/issue33376>
___
___
Python-bugs-list mailin
Change by Ma Lin :
--
nosy: +methane
___
Python tracker
<https://bugs.python.org/issue44114>
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.pyth
Change by Ma Lin :
--
pull_requests: +24779
pull_request: https://github.com/python/cpython/pull/26143
___
Python tracker
<https://bugs.python.org/issue41
Ma Lin added the comment:
Sorry, for the (init_size > UINT32_MAX) problem, I have a better solution.
Please imagine this scenario:
- before the patch
- in 64-bit build
- use zlib.decompress() function
- the exact decompressed size is known and > UINT32_MAX (e.g. 10 GiB)
If set the `b
Change by Ma Lin :
--
nosy: +malin
___
Python tracker
<https://bugs.python.org/issue43650>
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.pyth
Change by Ma Lin :
--
nosy: +malin
___
Python tracker
<https://bugs.python.org/issue44134>
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.pyth
New submission from Ma Lin :
If run this code, it will raise an exception:
import pickle
import lzma
import pandas as pd
with lzma.open("test.xz", "wb") as file:
pickle.dump(pd.DataFrame(range(1_000_000)), file, protocol=5)
The exception:
Tr
Ma Lin added the comment:
Ok, I'm working on a PR.
--
___
Python tracker
<https://bugs.python.org/issue44439>
___
___
Python-bugs-list mailing list
Unsubscr
Change by Ma Lin :
--
keywords: +patch
pull_requests: +25350
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/26764
___
Python tracker
<https://bugs.python.org/issu
Ma Lin added the comment:
I am checking all the .py files in `Lib` folder.
hmac.py has two len() bugs:
https://github.com/python/cpython/blob/v3.10.0b3/Lib/hmac.py#L212
https://github.com/python/cpython/blob/v3.10.0b3/Lib/hmac.py#L214
I think PR 26764 is prepared, it fixes the len() bugs in
Change by Ma Lin :
--
nosy: +malin
___
Python tracker
<https://bugs.python.org/issue44458>
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.pyth
1 - 100 of 394 matches
Mail list logo