New submission from STINNER Victor :
To decode byte string from the locale encoding (LC_CTYPE),
PyUnicode_DecodeFSDefault() can be used, but this function uses a constant
encoding set at startup (the locale encoding at startup). The right method is
currently to call _Py_char2wchar() and then
Changes by STINNER Victor :
--
keywords: +patch
Added file: http://bugs.python.org/file23886/pyunicode_decodelocale.patch
___
Python tracker
<http://bugs.python.org/issue13
STINNER Victor added the comment:
I collected the locale list triggering the mbstowcs() bug thanks my previous
commit:
* hu_HU (ISO8859-2): character U+3020
* de_AT (ISO8859-1): character U+3076
* cs_CZ (ISO8859-2): character U+3020
* sk_SK (ISO8859-2): character U+3020
STINNER Victor added the comment:
@Serg Asminog: What is your Python version? What is your locale encoding
(print(sys.getfilesystemencoding())? What is your Windows version?
--
___
Python tracker
<http://bugs.python.org/issue4
New submission from STINNER Victor :
[333/363] test_multiprocessing
Timeout (1:00:00)!
Thread 0x000112d0b000:
File
"/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/multiprocessing/connection.py",
line 411 in _recv
File
"/Users/buildbot/buildarea/3.x.parc-snow
STINNER Victor added the comment:
I didn't see this failure again since the issue was opened, so I close it as
invalid.
--
resolution: -> invalid
status: open -> closed
___
Python tracker
<http://bugs.python.
STINNER Victor added the comment:
The Solaris buildbot is green, let's close it. I didn't report the bug
upstream. Feel free to report it to Oracle!
--
resolution: -> fixed
status: open -> closed
___
Python tracker
<htt
STINNER Victor added the comment:
Oops, it's not sys.getfilesystemencoding(), but locale.getpreferredencoding()
which is interesting. Can you give me your locale encoding?
--
___
Python tracker
<http://bugs.python.org/i
STINNER Victor added the comment:
> I wrote down when I set up the OpenIndiana buildbots
Hum, please use the issue #13552 for curses issues on OpenIndiana/Solaris.
> ... de funciones: "mvwchgat" y "wchgat"
See issues #3786 and #13552 for this problem.
> I insta
Changes by STINNER Victor :
--
resolution: -> fixed
status: open -> closed
___
Python tracker
<http://bugs.python.org/issue5905>
___
___
Python-bugs-list
STINNER Victor added the comment:
I fixed issue #5905 (strptime fails in non-UTF locale). The fix is not enough
if the locale is changed in Python.
Update the patch to fix time.strftime() (if wcsftime() is not available).
--
Added file: http://bugs.python.org/file23894
STINNER Victor added the comment:
The FreeBSD 7.2 3.x buildbot is green.
--
resolution: -> fixed
status: open -> closed
___
Python tracker
<http://bugs.python.org/i
STINNER Victor added the comment:
Le 09/12/2011 22:12, Stefan Krah a écrit :
> The bottleneck in _decimal is (res is ascii):
>
> PyUnicode_FromString(res);
>
> PyUnicode_DecodeASCII(res) has the same performance.
>
>
> With this function ...
>
>static PyObj
New submission from STINNER Victor :
http://www.python.org/dev/buildbot/all/builders/ARM%20Ubuntu%203.x/builds/143/steps/test/logs/stdio
---
test test_curses crashed -- Traceback (most recent call last):
File
"/var/lib/buildbot/buildarea/3.x.warsaw-ubuntu-arm/build/Lib/test/regrte
STINNER Victor added the comment:
The compilation of the module failed for the same reason:
building '_curses' extension
gcc -pthread -fPIC -Wno-unused-result -g -O0 -Wall -Wstrict-prototypes
-DHAVE_NCURSESW=1 -I/usr/include/ncursesw -IInclude -I. -I./Include
-I/usr/include/arm-lin
STINNER Victor added the comment:
The problem comes maybe from the name of a curses key, keyname().
PyInit__curses() gets the name of all keys (KEY_MIN..KEY_MAX).
--
___
Python tracker
<http://bugs.python.org/issue13
STINNER Victor added the comment:
Hum, it's still not ok:
==
FAIL: test_tzset (test.test_time.TimeTestCase)
--
Traceback (most recent call last):
STINNER Victor added the comment:
Can you please write a doc patch?
--
___
Python tracker
<http://bugs.python.org/issue13561>
___
___
Python-bugs-list mailin
STINNER Victor added the comment:
.. versionchanged:: 3.2
- The *strict* parameter is deprecated. HTTP 0.9-style "Simple Responses"
+ The *strict* parameter is removed. HTTP 0.9-style "Simple Responses"
are not supported anymore.
Such change looks wrong:
STINNER Victor added the comment:
@Barry: can you try to get a trace using gdb? Start python in gdb, set a
breapoint on PyErr_SetObject, continue, run the Python command "import
_curses", get the gdb traceback (or continue if the error is not the UT
Changes by STINNER Victor :
--
nosy: +haypo
___
Python tracker
<http://bugs.python.org/issue13539>
___
___
Python-bugs-list mailing list
Unsubscribe:
STINNER Victor added the comment:
> How different is the performance cost of this solution compared
> to inserting DTrace probe for the same purpose?
DTrace is only available on some platforms (Solaris and maybe FreeBSD?).
--
___
Python t
New submission from STINNER Victor :
Attached patch fixes Makefile.pre.in to only recompile Lib/_sysconfigdata.py
when needed.
--
files: sysconfigdata.patch
keywords: patch
messages: 149406
nosy: haypo, pitrou
priority: normal
severity: normal
status: open
title: Only recompile Lib
Changes by STINNER Victor :
--
components: +Build
___
Python tracker
<http://bugs.python.org/issue13596>
___
___
Python-bugs-list mailing list
Unsubscribe:
STINNER Victor added the comment:
Various comments of the PEP 393 and your patch.
"For compatibility with existing APIs, several representations
may exist in parallel; over time, this compatibility should be phased
out."
and
"For compatibility, redundant representations may b
STINNER Victor added the comment:
> The patch in msg<148968> solves the issue for me.
Cool, I applied the patch to Python 3.2 and 3.3.
--
resolution: -> fixed
status: open -> closed
___
Python tracker
<http://bugs.pyt
STINNER Victor added the comment:
changeset: 74002:279b0aee0cfb
user:Victor Stinner
date:Fri Dec 16 23:56:01 2011 +0100
files: Doc/c-api/unicode.rst Include/unicodeobject.h
Modules/_localemodule.c Modules/main.c Modules/timemodule.c
description:
Add
New submission from STINNER Victor :
The curses module (only since Python 3.3), locale.strcoll(), locale.strxfrm(),
time.strftime() and imp.NullImporter() (only on Windows) accept embedded null
characters, whereas they convert the Unicode string to a wide character
(wchar_t*) string.
The
STINNER Victor added the comment:
PyUnicode_AsWideCharString() documentation should also warn about this issue.
--
___
Python tracker
<http://bugs.python.org/issue13
Changes by STINNER Victor :
--
nosy: +haypo
___
Python tracker
<http://bugs.python.org/issue13618>
___
___
Python-bugs-list mailing list
Unsubscribe:
New submission from STINNER Victor :
To factorize the code and to fix encoding issues in the time module, I added
functions to decode/encode from/to the locale encoding:
PyUnicode_DecodeLocale(), PyUnicode_DecodeLocaleAndSize() and
PyUnicode_EncodeLocale() (issue #13560). During tests, I
STINNER Victor added the comment:
Ok, I think that the current code is good enough to close the issue. I opened a
more global issue about the Python codec: #13619.
--
resolution: -> fixed
status: open -> closed
___
Python tracker
Changes by STINNER Victor :
--
keywords: +patch
Added file: http://bugs.python.org/file23985/locale_encoding.patch
___
Python tracker
<http://bugs.python.org/issue13
STINNER Victor added the comment:
# On FreeBSD, Solaris and Mac OS X, b'\xff' can be decoded in
# the C locale. The C locale is something like ISO-8859-1, not
# 7-bit ASCII.
On FreeBSD, it *is* the ISO-8859-1 encoding.
--
___
Python trac
STINNER Victor added the comment:
http://www.python.org/dev/buildbot/all/builders/x86%20Gentoo%203.x/builds/1327/steps/test/logs/stdio
==
ERROR: test_list_active (test.test_nntplib.NetworkedNNTPTests
STINNER Victor added the comment:
Oooh, I missed the important sentence "Accordingly, constructor arguments are
interpreted as for bytearray()." The 5 constructors are documented in bytearray
doc:
http://docs.python.org/dev/library/functions.html#bytearray
--
resolution:
STINNER Victor added the comment:
Patch version 2: improve the test. Try also the user locale encoding if the C
locale uses ISO-8859-1 (should improve the code coverage on FreeBSD, Mac OS X
and Solaris).
--
Added file: http://bugs.python.org/file23987/locale_encoding-2.patch
STINNER Victor added the comment:
> Should we fix this (Py_ssize_t, overflow check before computation), as in
> #11564?
Yes. Use Py_ssize_t type for the buf_size attribute, and replace "bigger <= 0"
(test if an overflow occurred) by "self->buf_size > (PY_SSIZE_
STINNER Victor added the comment:
I tested locale_encoding-2.patch on Linux, FreeBSD and Windows: UTF-8 and
ISO-8859-1 locales on Linux and FreeBSD, and the cp1252 ANSI code page on
Windows.
--
___
Python tracker
<http://bugs.python.
STINNER Victor added the comment:
Sorted and grouped results. "replace", "find" and "concat" should be easy to
fix, "format" is a little bit more complex, "strip" and "split" depends on
"find" performance and require to
STINNER Victor added the comment:
Sorted and grouped results. "replace", "find" and "concat" should be easy to
fix, "strip" and "split" depend on "find" performance.
replace:
- b"...text.with.2000.lines...replace(b"
STINNER Victor added the comment:
Grouped results.
find (first):
- (b"A"*1000).find(b"A"): -70%
- (b"A"*1000).rfind(b"A") : -70%
- (b"A"*1000).index(b"A") : -71%
- (b"A"*1000).rindex(b"A") : -68%
- (
STINNER Victor added the comment:
Boris.FELD told me that there was a bug in compare.py: all numbers are related
to Unicode (see #13621), not bytes.
--
___
Python tracker
<http://bugs.python.org/issue13
Changes by STINNER Victor :
--
resolution: -> invalid
status: open -> closed
___
Python tracker
<http://bugs.python.org/issue13622>
___
___
Python-bugs-
STINNER Victor added the comment:
See also the issue #13621 for results on Unicode.
--
___
Python tracker
<http://bugs.python.org/issue13623>
___
___
Python-bug
STINNER Victor added the comment:
See also the issue #13623 for results on bytes.
--
___
Python tracker
<http://bugs.python.org/issue13621>
___
___
Python-bug
New submission from STINNER Victor :
iobench benchmarking tool showed that the UTF-8 encoder is slower in Python 3.3
than Python 3.2. The performance depends on the characters of the input string:
* 8x faster (!) for a string of 50.000 ASCII characters
* 1.5x slower for a string of 50.000
Changes by STINNER Victor :
--
nosy: +flox
___
Python tracker
<http://bugs.python.org/issue13623>
___
___
Python-bugs-list mailing list
Unsubscribe:
Changes by STINNER Victor :
--
nosy: +flox
___
Python tracker
<http://bugs.python.org/issue13621>
___
___
Python-bugs-list mailing list
Unsubscribe:
STINNER Victor added the comment:
> Can you please provide your exact testing procedure?
Here you have.
$ cat bench.sh
echo -n "ASCII: "
./python -m timeit 'x="A"*5' 'x.encode("utf-8")'
echo -n "UCS-1: "
./python -m timeit
STINNER Victor added the comment:
Oh, Antoine told me that I missed the -s command line argument to timeit:
$ cat bench.sh
echo -n "ASCII: "
./python -m timeit -s 'x="A"*5' 'x.encode("utf-8")'
echo -n "UCS-1: "
./python -m t
STINNER Victor added the comment:
Python 3.2 (narrow):
ASCII: 1 loops, best of 3: 28.2 usec per loop
UCS-1: 1 loops, best of 3: 59.1 usec per loop
UCS-2: 1 loops, best of 3: 88.8 usec per loop
UCS-4: 1000 loops, best of 3: 254 usec per loop
Python 3.2 (wide):
ASCII: 1 loops
STINNER Victor added the comment:
> 8x faster (!) for a string of 50.000 ASCII characters
Oooh, it's just faster because encoding ASCII to UTF-8 is now O(1). The ASCII
data is shared with the UTF-8 data thanks to the PEP 393!
--
___
Python
Changes by STINNER Victor :
--
resolution: -> fixed
status: open -> closed
___
Python tracker
<http://bugs.python.org/issue13530>
___
___
Python-bugs-list
STINNER Victor added the comment:
Thanks for the patch Jérémy.
--
nosy: +haypo
___
Python tracker
<http://bugs.python.org/issue13530>
___
___
Python-bugs-list m
STINNER Victor added the comment:
> (b"A"*1000).find(b"A"): -70%
This one is a performance regression introduced by #12170. Attached patch
checks object type before trying a conversion to size_t instead of catching an
exception.
--
keywords:
STINNER Victor added the comment:
bytes_find.patch only works for Python int, not object with the __index__
method. My new patch (bytes_find-2.patch) uses PyNumber_Check() instead of
PyLong_Check() to be more generic. It fixes also a different issue: raise the
same ValueError than bytes.find
Changes by STINNER Victor :
Removed file: http://bugs.python.org/file24012/bytes_find-2.patch
___
Python tracker
<http://bugs.python.org/issue13623>
___
___
Python-bug
Changes by STINNER Victor :
Added file: http://bugs.python.org/file24013/bytes_find-2.patch
___
Python tracker
<http://bugs.python.org/issue13623>
___
___
Python-bug
STINNER Victor added the comment:
New changeset 75648db1b3f3 by Victor Stinner in branch 'default':
http://hg.python.org/cpython/rev/75648db1b3f3
Issue #13623: Fix a performance regression introduced by issue #12170 in
bytes.find() and handle correctly OverflowError (raise the same
STINNER Victor added the comment:
I checked stringbench: there is no more performance regression (difference of
more than 20%).
--
resolution: -> fixed
status: open -> closed
___
Python tracker
<http://bugs.python.org/i
STINNER Victor added the comment:
_Py_c_pow() doc is wrong:
+ If :attr:`exp.imag` is not null, or :attr:`exp.real` is negative,
+ this method returns zero and sets :c:data:`errno` to :c:data:`EDOM`.
The function only fails if num=0 and exp.real < 0 or if num=0 and exp.imag !
STINNER Victor added the comment:
> "...text.with.2000.lines...replace("\n", " ") (*10): -37.668161%
I also noticed a difference between Python 3.2 and 3.3, but Python 3.3 is 13%
*faster* (and not slower). This benchmark is not really representative because
str
STINNER Victor added the comment:
> I also noticed a difference between Python 3.2 and 3.3,
> but Python 3.3 is 13% *faster* (and not slower).
Oops, I misused the timeit module, there is a regression.
> New changeset c802bfc8acfc by Victor Stinner in branch 'default':
>
Changes by STINNER Victor :
--
resolution: -> fixed
status: open -> closed
___
Python tracker
<http://bugs.python.org/issue13522>
___
___
Python-bugs-list
STINNER Victor added the comment:
Updated patch to fix also the size of the small buffer on the stack, as
suggested by Antoine.
--
Added file: http://bugs.python.org/file24021/utf8_encoder-2.patch
___
Python tracker
<http://bugs.python.
STINNER Victor added the comment:
utf8_encoder_prescan.patch: precompute the size of the output to avoid a
PyBytes_Resize() at exit. It is much slower:
ASCII: 10 loops, best of 3: 2.06 usec per loop
UCS-1: 1 loops, best of 3: 123 usec per loop
UCS-2: 1 loops, best of 3: 171 usec
Changes by STINNER Victor :
Removed file: http://bugs.python.org/file24005/utf8_encoder.patch
___
Python tracker
<http://bugs.python.org/issue13624>
___
___
Python-bug
STINNER Victor added the comment:
Patch version 3 to fix compiler warnings (avoid variables used for the error
handler, unneeded for UCS-1).
--
Added file: http://bugs.python.org/file24023/utf8_encoder-3.patch
___
Python tracker
<h
Changes by STINNER Victor :
--
resolution: -> fixed
status: open -> closed
___
Python tracker
<http://bugs.python.org/issue13624>
___
___
Python-bugs-list
New submission from STINNER Victor :
If Python is compiled with gcc -O3, gdb is unable to get the f argument of
PyEval_EvalFrameEx(). It is possible to retrieve "f" from the caller,
PyEval_EvalCodeEx().
Attached patch tries to implement this idea and enable more test_gdb tests on
STINNER Victor added the comment:
> It's actually still O(n): the UTF-8 data still need to be copied
> into a bytes object.
Hum, correct, but a memory copy is much faster than having to decode UTF-8.
--
___
Python tracker
<http://b
STINNER Victor added the comment:
embedded_nul-2.patch: a more complete patch check also null byte in functions
calling PyUnicode_EncodeFSDefault().
--
Added file: http://bugs.python.org/file24041/embedded_nul-2.patch
___
Python tracker
<h
STINNER Victor added the comment:
There is failure on a XP buildbot. I don't know if it is a sporadic issue or
not.
http://www.python.org/dev/buildbot/all/builders/x86%20XP-5%203.x/builds/3921/steps/test/logs/stdio
==
STINNER Victor added the comment:
> It is possible to retrieve "f" from the caller, PyEval_EvalCodeEx()
It does not always work, but it works sometimes, so it's better to try :-)
I applied my fix to Python 2.7, 3.2 and 3.3. lipython.py of Python 2.7 is
outdated, it should
STINNER Victor added the comment:
> Currently when running Python on a non-OSX posix environment
> under either the C locale, or with an invalid or missing locale,
> it's not possible to operate using unicode filenames outside
> the ascii range.
It was already discussed:
STINNER Victor added the comment:
> under either the C locale, or with an invalid or missing locale
The right fix is to fix your locale, not Python.
--
___
Python tracker
<http://bugs.python.org/issu
STINNER Victor added the comment:
> If there was a separate LC_FILENAMES then Python could respect
> that and insist people set it, but there isn't.
During 1 month, we had PYTHONFSENCODING environment variable. It was not a good
idea. Again: please read the discussion (in cl
STINNER Victor added the comment:
> There are two problems with this: one is just the practical
> one that it scales poorly to have to tell every user to do this
> and to take them through working out how to set this in a way
> that covers cron jobs, daemons, things run over ssh, e
STINNER Victor added the comment:
> The main problem I see being discussed is that
> changing the encoding after Python starts would
> be dangerous, which I agree with, but we're not
> proposing to do that.
Not after Python start. Using two encodings at the same would just a
STINNER Victor added the comment:
I would be possible to implement incremental decoder with mbsrtowcs() and
incremental encoder with wcsrtombs(), by serializing mbstate_t to a long
integer (TextIOWrapper.tell() does something like that). The problem is that
mbsrtowcs() and wcsrtombs() are
STINNER Victor added the comment:
I should not write comments so late :-p
> Not after Python start. Using two encodings at the same would just ...
at the same time
> ... because I would like to inconsistency.
because it would lead to inconsist
STINNER Victor added the comment:
> Having more than one encoding on unix is already a reality, there's nothing
> to stop someone setting LANG=de_DE.UTF-8 and LC_MESSAGES=C say.
Nope. The locale encoding is chosen using LC_ALL, LC_CTYPE or LANG
variable: use the first non-emp
STINNER Victor added the comment:
> By default the Python SSL/TLS Stack (client/server) expose
> unsecure protocols (SSLv2) and unsecure ciphers (EXPORT 40bit DES).
If there is a problem, it should not be fixed in Python, but in the underlying
library (OpenSSL) or in applications. Pytho
STINNER Victor added the comment:
+ self.name = self.name.encode("iso-8859-1", "replace")
Why did you chose ISO-8859-1? I think that the filesystem encoding should be
used instead:
-self.name = self.name.encode("iso-8859-1", "replace")
+
STINNER Victor added the comment:
> I'm not sure about the best module to host this, though: os.path ?
Some OS don't provide atomic rename. If we only define a function when it is
atomic (as we do in the posix module, only expose functions available on the
OS), programs will
STINNER Victor added the comment:
"The gzip format (defined in RFC 1952) allows storing the original filename
(without the .gz suffix) in an additional field in the header (the FNAME
field). Latin-1 (iso-8859-1) is required."
Hum, it looks like the author of the gzip program (on Li
STINNER Victor added the comment:
> it will still be passing values that can't be
> interpreted by other processes as you highlighed earlier.
On UNIX, data going outside Python has be be encoded: you pass byte strings,
not directly Unicode. Surrogates are encoded back to ori
STINNER Victor added the comment:
This discussion is becoming very long, I didn't remember the original
purpose. You want to use UTF-8 instead of ASCII, so what? What do you
want to do with your nicely well decoded filenames? You cannot print it
to your terminal nor pass it to a subpr
STINNER Victor added the comment:
>> Nope. The locale encoding is chosen using LC_ALL, LC_CTYPE or LANG
>> variable: use the first non-empty variable. LC_MESSAGES doesn't affect
>> the encoding. Example:
>
> That's good to know, thanks. Only leaves the case
STINNER Victor added the comment:
On 22/12/2011 02:16, Martin Pool wrote:
> The proposal is that in some cases where Python currently assumes
> filenames are ascii on Linux, it ought to instead assume they are
> utf-8.
Oh, I expected a use case describing the problem, not the
STINNER Victor added the comment:
> The problem as I see it is this:
>
> On Linux, filenames are generally (but not always) in UTF-8; people
> fairly commonly end up with no locale configured, which causes Python
> to decode filenames as ascii. It is easy for this to end up with
STINNER Victor added the comment:
+ encoding = locale.getpreferredencoding()
It should be locale.getpreferredencoding(False).
--
___
Python tracker
<http://bugs.python.org/issue13
Changes by STINNER Victor :
--
title: Python Crashes When Saving Or Opening -> IDLE: Python Crashes When
Saving Or Opening
___
Python tracker
<http://bugs.python.org/issu
STINNER Victor added the comment:
> Victor, could you try the attached script on FreeBSD,
> to see if you get ECONNREFUSED?
Yes, I get a ECONNREFUSED. I tested backlog.py on FreeBSD 8.2.
--
___
Python tracker
<http://bugs.python.org/i
STINNER Victor added the comment:
timemodule.c has the following check:
#if defined(_MSC_VER) || defined(sun)
if (buf.tm_year + 1900 < 1 || < buf.tm_year + 1900) {
PyErr_SetString(PyExc_ValueError,
"strftime() requires year
Changes by STINNER Victor :
--
nosy: +haypo
___
Python tracker
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe:
STINNER Victor added the comment:
> Unless there's evidence of performance regressions
> or backward incompatibilities, I agree.
If hash() is modified, str(dict) and str(set) will change for example. It may
break doctests. Can we consider that the application should not rely
(ind
Changes by STINNER Victor :
--
keywords: +patch
Added file: http://bugs.python.org/file24135/3106cc0a2024.diff
___
Python tracker
<http://bugs.python.org/issue13
STINNER Victor added the comment:
> I assume this is left over from the PEP 393 changes.
Correct.
> I'm not sure such a restriction needs to exist any more.
The restriction was introduced to simplify the implementation. maxchar has to
be computed exactly in format_stri
1201 - 1300 of 35168 matches
Mail list logo