Changes by STINNER Victor :
Added file: http://bugs.python.org/file18671/Py_UNICODE_strcat.patch
___
Python tracker
<http://bugs.python.org/issue9425>
___
___
Python-bug
STINNER Victor added the comment:
Py_UNICODE_strcat.patch: create Py_UNICODE_strcat() function.
Py_UNICODE_strdup.patch: create Py_UNICODE_strdup() function.
--
Added file: http://bugs.python.org/file18672/Py_UNICODE_strdup.patch
___
Python tracker
STINNER Victor added the comment:
The problem is not specific to Py_CompileString(): all functions based
(indirectly) on PyParser_ASTFromString() and PyParser_ASTFromFile() expect
filenames encoded in utf-8 with the strict error handler.
If we choose to use something else than utf-8 in
Changes by STINNER Victor :
--
components: +Unicode -None
versions: +Python 3.2
___
Python tracker
<http://bugs.python.org/issue9713>
___
___
Python-bugs-list m
STINNER Victor added the comment:
utf-8 codec (in strict mode) rejects surrogates in python3, and so you doesn't
support undecodable filenames (filenames decoded using surrogateescape error
handler which produces surrogate characters). It may be possible if you use
surrogateescape every
STINNER Victor added the comment:
Ok to remove it from Python 3.2. I don't think that it is necessary to update
Python 2.7 code/doc.
--
___
Python tracker
<http://bugs.python.org/i
STINNER Victor added the comment:
> According to the Unicode standard the high and low surrogate halves used
> by UTF-16 (...)
Yes, but in Python, U+DC80..D+DCFF range is used to store undecodable bytes.
Eg. 'abc\xff'.decode('ascii', 'surrogateescape')
New submission from STINNER Victor :
Many C functions have bytes argument (char* type) but the encoding is not
documented. If would not be a problem if the encoding was always the same, but
it is not. Examples:
- format of PyUnicode_FromFormat() should be encoded as ISO-8859-1
- filename of
STINNER Victor added the comment:
r84429 creates Py_UNICODE_strcat() (change with the patch: return the right
value).
r84430 creates PyUnicode_strdup() (change with the patch: rename the function
from Py_UNICODE_strdup() to PyUnicode_strdup() and mangle the function name
Changes by STINNER Victor :
Removed file: http://bugs.python.org/file18672/Py_UNICODE_strdup.patch
___
Python tracker
<http://bugs.python.org/issue9425>
___
___
Python-bug
Changes by STINNER Victor :
Removed file: http://bugs.python.org/file18671/Py_UNICODE_strcat.patch
___
Python tracker
<http://bugs.python.org/issue9425>
___
___
Python-bug
Changes by STINNER Victor :
--
nosy: +haypo
___
Python tracker
<http://bugs.python.org/issue7077>
___
___
Python-bugs-list mailing list
Unsubscribe:
STINNER Victor added the comment:
<< I found this crash while playing with proxies (thanks haypo).
http://code.activestate.com/recipes/496741-object-proxying/ >>
My question was: why does isinstance(Proxy('abc'), str) works (give True),
whereas re.match('abc'
STINNER Victor added the comment:
>>> class Spam(object):
... def __getattribute__(self, name):
... if name == '__class__':
... return str
... raise AttributeError
...
>>> spam = Spam('spam')
>>> isinstance(s
STINNER Victor added the comment:
I am able to reproduce the crash with z > 4:
# (magic, type (rle, bpp), dim, x, y, z)
open('image', 'wb').write(struct.pack('>hh', 0732, 1, 1, 1, 1, 10))
rgbimg.longimagedata('image')
--
But not
New submission from STINNER Victor :
I'm trying to document the encoding of all bytes argument of the C API: see
#9738. I tried to understand which encoding is used by PyUnicode_FromFormat*()
(and PyErr_Format() which calls PyUnicode_FromFormatV()). It looks like
ISO-8859-1
STINNER Victor added the comment:
About PyErr_Format() and PyUnicode_FromFormat*() encoding: it's not exactly
ISO-8859-1... there is a bug => issue #9769.
--
___
Python tracker
<http://bugs.python.or
STINNER Victor added the comment:
Another possibility is to use _Py_char2wchar() + PyUnicode_FromWideChar() /
_Py_wchar2char() + PyUnicode_AsWideChar() to decode / encode filenames. These
functions use the locale encoding. This solution was possible in Python 3.1,
but no more in Python 3.2
STINNER Victor added the comment:
> In such environments you cannot expect the user to configure the
> system properly (i.e. set an environment variable).
Why would it be different for embeded python?
> Instead, the application has to provide an educated guess
> to the Python in
STINNER Victor added the comment:
PyUnicode_Check(op) checks op->ob_type->tp_flags & Py_TPFLAGS_UNICODE_SUBCLASS.
--
___
Python tracker
<http://bugs.python.
STINNER Victor added the comment:
I have different questions:
- Should we trust PyObject_IsInstance() or PyUnicode_Check() (because they
give different results)?
- Should PyObject_IsInstance() and PyUnicode_Check() give the same result?
- Should we fix the segfault?
To fix the segfault, I
STINNER Victor added the comment:
Oh, I didn't see that the issue was specific to Python2. I updated the issue's
title. If I understood correctly, the issue is also specific to Windows.
Do you know if your patch changes the public API? (break the compatibility)
--
FYI abo
STINNER Victor added the comment:
> Do we really want to support this kind of configuration?
There is also a problem is the directory name is b'py3k\xe9': at startup (utf-8
encoding), the name is decoded to 'py3k\udce9'. When the locale encoding is set
to iso-885
STINNER Victor added the comment:
> PyUnicode_FromFormat("%s", text) expects a utf-8 buffer.
Really? I don't see how "*s++ = *f;" (where s is Py_UNICODE* and f is char*)
can decode utf-8. It looks more like ISO-8859-1.
> Very recently (r84472, r84485), some
STINNER Victor added the comment:
About "embedded Python interpreters or py2exe-style applications": do you mean
that the application calls a C function to set the encoding before starting the
interpreter? Or you mean the Python function, sys.setfilesystemencoding()?
I would like
STINNER Victor added the comment:
"keep the C function"
Hum, currently, Python3 only has a *private* function called
_Py_SetFileSystemEncoding() which can only be called after _Py_InitializeEx()
(because it relies on the codecs API). If you consider that there is a real use
case,
STINNER Victor added the comment:
I commited my patch (with a new test, iso-8859-1:replace) to 2.7: r84621. I
will no backport to 2.6 because this branch now only accept security fixes.
--
resolution: -> fixed
status: open -> closed
___
STINNER Victor added the comment:
> My remark is that utf-8 tend to be applied to all kind of files;
> if someone once decide that non-ascii chars are allowed in (some)
> string constants, they will be stored in utf-8.
In this case, it will be better to raise an error on non-a
STINNER Victor added the comment:
For unicode, ascii(x) is implemented as repr(x).encode('ascii',
'backslashreplace').decode('ascii').
repr(x) is "'" + x + "'" for printable characters (eg. U+1D121), and "'U+%08x'&q
STINNER Victor added the comment:
> >>> s = "'\0\"\n\r\t abcd\x85é\U00012fff\U0001D121xxx\uD800."
> (...)
> (I think I've included everything:
> - normal chars
> - control chars
> - one-byte non-ASCII
> - two-byte non-ASCII (and lone sur
STINNER Victor added the comment:
#6543 changed the encoding of the filename argument of
PyRun_SimpleFileExFlags() (and all functions based on PyRun_SimpleFileExFlags)
and c_filename attribute of the compiler (private) structure in Python 3.1.3:
use utf-8 in strict mode instead of filesystem
STINNER Victor added the comment:
#6543 changed the encoding of the filename argument of
PyRun_SimpleFileExFlags() (and all functions based on PyRun_SimpleFileExFlags)
and c_filename attribute of the compiler (private) structure in Python 3.1.3:
use utf-8 in strict mode instead of filesystem
STINNER Victor added the comment:
See also #9713 (Py_CompileString fails on non decode-able paths) and #9738
(Document the encoding of functions bytes arguments of the C API).
--
___
Python tracker
<http://bugs.python.org/issue8
STINNER Victor added the comment:
Do you think that it is a Python bug? You should first try to report a bug on
eGenenix bug tracker: http://www.egenix.com/services/support/
--
nosy: +haypo
___
Python tracker
<http://bugs.python.org/issue9
STINNER Victor added the comment:
> WARNING: The filename '@test_464_tmp-共有される' CAN be encoded
> by (...) cp932
We should find character not encodable in any Windows code page, but accepted
as filenames.
> characters like "\u2661" or "\u2668" (..
New submission from STINNER Victor :
In Python 3.2, mbcs encoding (default filesystem encoding on Windows) is now
strict: raise an error on unencodable/undecodable characters/bytes. But
os.listdir(b'.') encodes unencodable bytes as b'?'.
Example:
>>> os.mkdir
STINNER Victor added the comment:
I found this bug while trying to find an unencodable filename for #9819
(TESTFN_UNDECODABLE).
Anyway, the bytes API should be avoided on Windows since Windows native
filename type is unicode.
--
___
Python
STINNER Victor added the comment:
See also #9820.
--
___
Python tracker
<http://bugs.python.org/issue9819>
___
___
Python-bugs-list mailing list
Unsubscribe:
New submission from STINNER Victor :
It would be nice to support PEP 383 (surrogateescape) on Windows, but the mbcs
codec doesn't support it for performance reason. The Windows functions to
encode/decode MBCS don't give the index of the unencodable/undecodable
character/byte. For en
STINNER Victor added the comment:
> os.listdir(b'listdir') should raise an error (and not ignore
> the filename or replaces unencodable characters by b'?').
To avoid the error, a solution is to support the PEP 383 on Windows (for the
mbcs encoding). I opened a separ
STINNER Victor added the comment:
> "dir" command cannot print filename correctly, though.
Who cares? We just have to be able to create a file with a name containing non
encodable characters, list the directory, and then remove this evil file.
--
With r84666, Python uses &
Changes by STINNER Victor :
Removed file: http://bugs.python.org/file18823/find_unencode_filename.py
___
Python tracker
<http://bugs.python.org/issue9819>
___
___
Pytho
STINNER Victor added the comment:
Oh wait. PEP 383 is a solution to store undecodable bytes in an unicode string,
but for mbcs I'm trying to get the opposite: store unicode in bytes and this is
not possible (at least with PEP 383).
Example with Python 3.1:
>>> print("abcŁ
STINNER Victor added the comment:
> With r84666, Python uses "-\u5171\u6709\u3055\u308c\u308b"
> suffix for TESTFN_UNENCODABLE.
Backported to 3.1 as r84668. I don't want to patch Python 2.x (its unicode
support is lower and the code is too different than Python3) and
STINNER Victor added the comment:
Patch:
- Remove the bytes version of listdir(): reuse the unicode version but
converts the filename to bytes using PyUnicode_EncodeFSDefault() if the
directory name is not unicode
- use Py_XDECREF(d) instead of Py_DECREF(d) at the end (because d=NULL on
STINNER Victor added the comment:
Close this issue: PEP 383 is specific to filesystem using bytes, it is useless
on Windows (the problem on Windows is on encoding, not on decoding).
--
resolution: -> invalid
status: open -> closed
___
STINNER Victor added the comment:
@amaury: Do you agree to reject non-ascii bytes?
TODO: document format encoding in Doc/c-api/*.rst.
--
___
Python tracker
<http://bugs.python.org/issue9
STINNER Victor added the comment:
I didn't proposed to add a new parameter to Py_InitializeEx() (which means
create a new function to not break the API), I just wrote that
_Py_SetFileSystemEncoding() doesn't work for your use case.
> If you embed Python into another applic
STINNER Victor added the comment:
> If you still consider that the change on .data as a bug,
> I think that the fix is to remove .data (mark it as
> protected: environ.data => environ._data).
r84690 marks os.environ.data as protected. Close this issue again.
--
STINNER Victor added the comment:
Fixed by r84692.
--
resolution: -> fixed
status: open -> closed
___
Python tracker
<http://bugs.python.org/issue9402>
___
__
STINNER Victor added the comment:
I don't know how to fix this issue, and I don't know if it can be fixed. As the
issue is very unlikely, I prefer to close it.
--
resolution: -> wont fix
status: open -> closed
___
Pytho
STINNER Victor added the comment:
Well, it was trivial to workaround this bug in my application (convert host to
bytes using explicit host = str(host)). Python3 doesn't have this issue and
Python 2.7 is released, I prefer to close this bug as wont fix.
--
resolution: -> fixe
Changes by STINNER Victor :
--
resolution: fixed -> wont fix
___
Python tracker
<http://bugs.python.org/issue7093>
___
___
Python-bugs-list mailing list
Un
STINNER Victor added the comment:
Well, it's not a bug, just a gcc warning. We don't need this patch.
--
resolution: -> wont fix
status: open -> closed
___
Python tracker
<http://bugs.p
STINNER Victor added the comment:
Does anyone agree with me?
--
___
Python tracker
<http://bugs.python.org/issue9408>
___
___
Python-bugs-list mailing list
Unsub
STINNER Victor added the comment:
It should be fixed by r84694.
--
status: open -> closed
___
Python tracker
<http://bugs.python.org/issue8589>
___
___
Python-
STINNER Victor added the comment:
It is fixed in 2.7 with the backport of the Python3's io library (r73394).
--
resolution: accepted -> fixed
status: open -> closed
___
Python tracker
<http://bugs.python
STINNER Victor added the comment:
New patch:
- add encoding option to TextFile constructor
- parse_makefile() uses the heuristic from text_file.diff
Note: sys.getfilesystemencoding() is always set in Python 3.2 (but it may be
None in Python 2.x and Python < 3.2).
--
Added f
STINNER Victor added the comment:
I attached a patch to #6011 to set the encoding to read the Makefile.
--
___
Python tracker
<http://bugs.python.org/issue9
STINNER Victor added the comment:
Fixed in r84696+r84697: confstr-minimal.diff +
PyUnicode_DecodeFSDefaultAndSize().
--
___
Python tracker
<http://bugs.python.org/issue9
Changes by STINNER Victor :
--
resolution: -> fixed
status: open -> closed
___
Python tracker
<http://bugs.python.org/issue9579>
___
___
Python-bugs-list
STINNER Victor added the comment:
Fixed in r84696+r84697: confstr-minimal.diff from #9579 +
PyUnicode_DecodeFSDefaultAndSize().
Thanks for the patch, sorry for the delay.
--
resolution: -> duplicate
status: open -> closed
___
Python t
Changes by STINNER Victor :
--
resolution: duplicate -> fixed
___
Python tracker
<http://bugs.python.org/issue9580>
___
___
Python-bugs-list mailing list
Un
STINNER Victor added the comment:
test_pep277.patch removes the usage of os.path.supports_unicode_filenames from
test_pep277: the test still pass on Debian Sid (Linux). Can someone test the
patch on Mac OS X, FreeBSD and Solaris (and maybe other POSIX/UNIX OSes)?
About Windows
STINNER Victor added the comment:
Oops, forget test_pep277.patch: I misunderstood r81149 (new way to detect if
the filesystem supports unicode or not). test_pep277 fails with my patch on
Linux with LC_CTYPE=C.
--
___
Python tracker
<h
STINNER Victor added the comment:
r84701 fixes supports_unicode_filenames's definition in Python 3.2 (and r84702
in Python 3.1): os.listdir(str) now always return unicode filenames (including
non-ascii characters).
--
___
Python tracker
STINNER Victor added the comment:
> Maybe os.path.supports_unicode_filenames should be deprecated.
> The doc currently says:
> "True if arbitrary Unicode strings can be used as file names
> (within limitations imposed by the file system), and if os.listdir()
> returns U
STINNER Victor added the comment:
$ ldd $(/usr/bin/python3.1 -c 'import readline; print(readline.__file__)')|grep
curses
libncurses.so.5 => /lib/libncurses.so.5 (0xb7537000)
$ ldd /lib/libreadline.so.6|grep curses
libncurses.so.5 => /lib/libncurses.so.5 (0xb76a6
STINNER Victor added the comment:
Fixed by r84704 in Python 3.2.
--
resolution: -> fixed
status: open -> closed
___
Python tracker
<http://bugs.python.org/
STINNER Victor added the comment:
> How about TESTFN_UNICODE (test_unicode_file) issue?
File "e:\python-dev\py3k\lib\test\test_unicode_file.py", line 12, in
TESTFN_UNICODE.encode(TESTFN_ENCODING)
UnicodeEncodeError: 'mbcs' codec can't encode character
Changes by STINNER Victor :
Added file: http://bugs.python.org/file18845/unicode_file.patch
___
Python tracker
<http://bugs.python.org/issue9819>
___
___
Python-bug
STINNER Victor added the comment:
> Thank you, your patch works.
Ok, patch commited to 3.2 as r84710. Thanks for your feedback.
--
___
Python tracker
<http://bugs.python.org/iss
STINNER Victor added the comment:
> Still happens with r84709 on PPC Tiger 3.x
It's not the same error, PYTHONWARNINGS is decoded from the wrong encoding:
locale encodind instead of utf-8. r84731 should fix this bug (at least, it
restores the encoding used because my last commit
STINNER Victor added the comment:
Fixed by r84730, thanks for the issue.
--
resolution: -> fixed
status: open -> closed
___
Python tracker
<http://bugs.python.org/
STINNER Victor added the comment:
> What do you gain with this patch? (i.e. what is its advantage?)
You know directly that os.listdir(bytes) is unable to encode the filename,
instead of manipulate an invalid filename (b'?') and get the error later (when
you use the filenam
STINNER Victor added the comment:
> FindFirst/NextFileA will also do some other interesting conversions,
> such as the best-fit conversion (which the "mbcs" code doesn't do
> (anymore?)).
About mbcs, mbcs codec of Python 3.1 is like .encode('mbcs', 'repl
STINNER Victor added the comment:
It remembers me the discussion of the issue #3187. About unencodable filenames,
Guido proposed to ignore them or to use errors="replace", and wrote "Failing
the entire os.listdir() call is not acceptable". (... long discussion ...) And
STINNER Victor added the comment:
> FindFirst/NextFileA will also do some other interesting conversions,
> such as the best-fit conversion (which the "mbcs" code doesn't do
> (anymore?)).
If we choose to keep this behaviour, I will have to revert my commit on mbcs
cod
STINNER Victor added the comment:
> I fail to see why removing incorrect file names from the result
> list is any better than keeping them. The result list will
> be incorrect either way.
It depends if you focus on displaying the content of the directory, or on
processing
STINNER Victor added the comment:
> I think trying to emulate, in Python, what the *A functions
> do is futile.
My problem is that some functions will use mbcs in strict mode (functions using
PyUnicode_EncodeFSDefault): raise UnicodeEncodeError, and other will use mbcs
in replac
STINNER Victor added the comment:
- ignore unencodable filenames is not a good idea
- raise an error on unencodable filenames breaks backward compatibility
- I don't think that emit a warning will change anything
Even if I don't like mbcs+replace (current behaviour of os.listdir(
Changes by STINNER Victor :
Removed file: http://bugs.python.org/file18841/test_pep277.patch
___
Python tracker
<http://bugs.python.org/issue767645>
___
___
Python-bug
STINNER Victor added the comment:
r84784 sets os.path.supports_unicode_filenames to True on Mac OS X (macpath
module).
About test_supports_unicode_filenames.patch. test_unicode_listdir() is wrong:
os.listdir(str) always return str (see r84701). "verify that the new file's
name i
STINNER Victor added the comment:
I backported r84701 and r84784 to Python 2.7 (r84787).
--
___
Python tracker
<http://bugs.python.org/issue767645>
___
___
Pytho
Changes by STINNER Victor :
--
resolution: -> fixed
status: open -> closed
___
Python tracker
<http://bugs.python.org/issue9819>
___
___
Python-bugs-list
STINNER Victor added the comment:
> There seems to be some confusion about the macpath.py module. (...)
Oops. I thought that Mac OS X uses macpath, but in fact it is posixpath. Can
you try my new patch posixpath_darwin.patch? I reopen the issue because I
patched the wrong module. I supp
STINNER Victor added the comment:
The solution may be different depending on Python version. I propose to keep
macpath in Python 2.7, just because it's too late to change such thing in
Python2. But we may mark macpath as deprecated, eg. "macpath will be removed in
Python 3.2"
STINNER Victor added the comment:
For non-ascii directory name but ascii locale (eg. C locale), we have 3 choices:
a- read Makefile as a binary file
b- use the PEP 383
c- refuse to compile
(a) doesn't seem easy because it looks like distutils use the unicode type for
all path
STINNER Victor added the comment:
Warning: "use the PEP 383" may impact other distutils component because the
path may be written into to other files, which mean that we have to use
errors='surrogateescape' for these files too.
--
__
Changes by STINNER Victor :
--
nosy: +haypo
___
Python tracker
<http://bugs.python.org/issue8998>
___
___
Python-bugs-list mailing list
Unsubscribe:
STINNER Victor added the comment:
> No problems noted with a quick test of posixpath_darwin.patch
> on 10.6 so looks good.
Ok thanks. Fix commited to 3.2 (r84866) and 2.7 (r84868). I kept my patch on
macpath (supports_unicode_filenames=True) because it is still valid (even if it
is no
STINNER Victor added the comment:
I don't see any test_warnings anymore on
http://code.google.com/p/bbreport/wiki/PythonBuildbotReport. Close this issue.
--
status: open -> closed
___
Python tracker
<http://bugs.python.or
Changes by STINNER Victor :
--
nosy: +haypo
___
Python tracker
<http://bugs.python.org/issue4661>
___
___
Python-bugs-list mailing list
Unsubscribe:
STINNER Victor added the comment:
New version of the patch:
- reencode sys.path_importer_cache (and remove the last FIXME)
- fix different reference leaks
- catch PyIter_Next() failures
- create a subfunction to reencode sys.modules: it's easier to review and
manager errors in sh
Changes by STINNER Victor :
Removed file: http://bugs.python.org/file18561/reencode_modules_path-2.patch
___
Python tracker
<http://bugs.python.org/issue9630>
___
___
STINNER Victor added the comment:
> I would rename the feature to something like "redecode-modules"
Yes, right. I will rename the functions before commiting the patch.
--
___
Python tracker
<http://bugs.pyth
STINNER Victor added the comment:
> Why is this needed ?
Short answer: to support filesystem encoding different than utf-8. See #8611
for a longer explanation.
Example:
$ pwd
/home/SHARE/SVN/py3ké
$ PYTHONFSENCODING=ascii ./python test_fs_encoding.py
Fatal Python error: Py_Initial
STINNER Victor added the comment:
> Not sure it's related, but there seems to be a bug:
It's not a bug, it's a feature :-) If you specify a non-existing locale, the
GNU libc fails back to ascii.
$ locale -a
C
français
french
fr_FR
fr...@euro
fr_FR.iso88591
fr_fr.iso885.
STINNER Victor added the comment:
> Some things about your patch:
> - as Amaury said, functions should be named "redecode*"
> rather than "reencode*"
Yes, as written before (msg117269), I will do it in my next patch.
> - please use -1 for error return, n
STINNER Victor added the comment:
Le vendredi 24 septembre 2010 14:35:29, Marc-Andre Lemburg a écrit :
> Thanks for the explanation. So the only reason why you have to go through
> all those hoops is to
>
> * allow the complete set of Python supported encoding names
2201 - 2300 of 35168 matches
Mail list logo