[issue5127] UnicodeEncodeError - I can't even see license

2009-10-06 Thread Amaury Forgeot d'Arc
Changes by Amaury Forgeot d'Arc : Added file: http://bugs.python.org/file15058/unicodectype_ucs4_3.patch ___ Python tracker ___ ___ Python-bugs

[issue5127] UnicodeEncodeError - I can't even see license

2009-10-06 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: > that would cause lots of compiler > warnings and implicit truncation on UCS2 builds Unfortunately, there is no such warning, or the initial problem we are trying to solve would have been spotted by such a warning (unicode_repr() calls Py_UNICODE_ISPRI

[issue5127] UnicodeEncodeError - I can't even see license

2009-10-06 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: It's not as easy as that. The functions for case conversion are used in a way that assumes they never fail (and indeed, the existing functions cannot fail). What we can do is change the input parameter to Py_UCS4, but not the Py_UNICODE output parameter, s

[issue5127] UnicodeEncodeError - I can't even see license

2009-10-06 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: So the discussion is now on 2 points: 1. Is the change backwards compatible? (at the code level, after recompilation). My answer is yes, because all known case transformations stay in the same plane: if you pass a char in the BMP, they return a char in t

[issue5127] UnicodeEncodeError - I can't even see license

2009-10-05 Thread Adam Olsen
Adam Olsen added the comment: On Mon, Oct 5, 2009 at 12:10, Marc-Andre Lemburg wrote: > All this is just nitpicking, really. UCS2 is a character set, > UTF-16 an encoding. UCS is a character set, for most purposes synonymous with the Unicode character set. UCS-2 and UTF-16 are both encodings

[issue5127] UnicodeEncodeError - I can't even see license

2009-10-05 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Adam Olsen wrote: > > Adam Olsen added the comment: > > On Mon, Oct 5, 2009 at 03:03, Marc-Andre Lemburg > wrote: >> We use UCS2 on narrow Python builds, not UTF-16. >> >>> We might keep the old public API for compatibility, but it should be >>> clearly

[issue5127] UnicodeEncodeError - I can't even see license

2009-10-05 Thread Adam Olsen
Adam Olsen added the comment: On Mon, Oct 5, 2009 at 03:03, Marc-Andre Lemburg wrote: > We use UCS2 on narrow Python builds, not UTF-16. > >> We might keep the old public API for compatibility, but it should be >> clearly marked as broken for non-BMP scalar values. > > That has always been the

[issue5127] UnicodeEncodeError - I can't even see license

2009-10-05 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Amaury Forgeot d'Arc wrote: > > Amaury Forgeot d'Arc added the comment: > >> We'd need to expose the UCS4 APIs *in addition* >> to those APIs and have the UCS2 APIs redirect to the UCS4 ones. > > Why have two names for the same function? it's Python 3, a

[issue5127] UnicodeEncodeError - I can't even see license

2009-10-05 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: > We'd need to expose the UCS4 APIs *in addition* > to those APIs and have the UCS2 APIs redirect to the UCS4 ones. Why have two names for the same function? it's Python 3, after all. Or is this "no recompile" feature so important (as long as changes are

[issue5127] UnicodeEncodeError - I can't even see license

2009-10-05 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: This is off-topic for the tracker item, but I'll reply anyway: Ezio Melotti wrote: > > Ezio Melotti added the comment: > >>> We might keep the old public API for compatibility, but it should be >>> clearly marked as broken for non-BMP scalar values. > >

[issue5127] UnicodeEncodeError - I can't even see license

2009-10-05 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Amaury Forgeot d'Arc wrote: > > Amaury Forgeot d'Arc added the comment: > >> we should make sure that it's not possible to load an extension >> compiled with 3.1 in 3.2 to prevent segfaults and buffer overruns. > > This is the case with this patch: today

[issue5127] UnicodeEncodeError - I can't even see license

2009-10-05 Thread Ezio Melotti
Ezio Melotti added the comment: >> We might keep the old public API for compatibility, but it should be >> clearly marked as broken for non-BMP scalar values. > That has always been the case. UCS2 doesn't support surrogates. > However, we have been slowly moving into the direction of making >

[issue5127] UnicodeEncodeError - I can't even see license

2009-10-05 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: > we should make sure that it's not possible to load an extension > compiled with 3.1 in 3.2 to prevent segfaults and buffer overruns. This is the case with this patch: today all these functions (_PyUnicode_IsAlpha, _PyUnicode_ToLowercase) are actually #d

[issue5127] UnicodeEncodeError - I can't even see license

2009-10-05 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Amaury Forgeot d'Arc wrote: > > Amaury Forgeot d'Arc added the comment: > >> No, but changing the APIs from 16-bit integers to 32-bit integers >> does require a recompile of all code using it. > > Is it acceptable between 3.1 and 3.2 for example? ISTM th

[issue5127] UnicodeEncodeError - I can't even see license

2009-10-05 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: > No, but changing the APIs from 16-bit integers to 32-bit integers > does require a recompile of all code using it. Is it acceptable between 3.1 and 3.2 for example? ISTM that other changes already require recompilation of extension modules. > Also, the

[issue5127] UnicodeEncodeError - I can't even see license

2009-10-05 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Adam Olsen wrote: > > Adam Olsen added the comment: > > Surrogates aren't optional features of UTF-16, we really need to get > this fixed. That includes .isalpha(). We use UCS2 on narrow Python builds, not UTF-16. > We might keep the old public API for

[issue5127] UnicodeEncodeError - I can't even see license

2009-10-04 Thread Adam Olsen
Adam Olsen added the comment: Surrogates aren't optional features of UTF-16, we really need to get this fixed. That includes .isalpha(). We might keep the old public API for compatibility, but it should be clearly marked as broken for non-BMP scalar values. I don't see a problem with changing

[issue5127] UnicodeEncodeError - I can't even see license

2009-09-24 Thread Ezio Melotti
Changes by Ezio Melotti : -- priority: -> normal stage: -> patch review ___ Python tracker ___ ___ Python-bugs-list mailing list Unsu

[issue5127] UnicodeEncodeError - I can't even see license

2009-02-03 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: > I must be missing some detail, but what does the Unicode database > have to do with the unicodeobject.c C API ? Ah, now I understand your concerns. My suggestion is to change only the 20 functions in unicodectype.c: _PyUnicode_IsAlpha, _PyUnicode_ToLo

[issue5127] UnicodeEncodeError - I can't even see license

2009-02-03 Thread Ezio Melotti
Ezio Melotti added the comment: haypo> ord() of Python3 (narrow build) rejects surrogate characters: haypo> '\U0001' haypo> >>> len(chr(0x1)) haypo> 2 haypo> >>> ord(0x1) haypo> TypeError: ord() expected string of length 1, but int found ord() works fine on Py3, you probably meant t

[issue5127] UnicodeEncodeError - I can't even see license

2009-02-03 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 2009-02-03 14:50, Amaury Forgeot d'Arc wrote: > Amaury Forgeot d'Arc added the comment: > >> That would cause major breakage in the C API > > Not if you recompile. I don't see how this breaks the API at the C level. Well, then try to look at such a c

[issue5127] UnicodeEncodeError - I can't even see license

2009-02-03 Thread STINNER Victor
STINNER Victor added the comment: lemburg> This is not possible for unichr() in Python 2.x, since applications lemburg> always expect len(unichr(x)) == 1 Oh, ok. lemburg> Changing ord() would be possible in Python 2.x is easier, since lemburg> this would only extend the range of returned value

[issue5127] UnicodeEncodeError - I can't even see license

2009-02-03 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: > That would cause major breakage in the C API Not if you recompile. I don't see how this breaks the API at the C level. > and is not inline with the intention of having a Py_UNICODE > type in the first place. Py_UNICODE is still used as the allocatio

[issue5127] UnicodeEncodeError - I can't even see license

2009-02-03 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 2009-02-03 14:14, STINNER Victor wrote: > STINNER Victor added the comment: > > amaury> Since r56395, ord() and chr() accept and return surrogate pairs > amaury> even in narrow builds. > > Note: My examples are made with Python 2.x. > >> The goal is

[issue5127] UnicodeEncodeError - I can't even see license

2009-02-03 Thread STINNER Victor
STINNER Victor added the comment: amaury> Since r56395, ord() and chr() accept and return surrogate pairs amaury> even in narrow builds. Note: My examples are made with Python 2.x. > The goal is to remove most differences between narrow and wide unicode > builds (except for string lengths, in

[issue5127] UnicodeEncodeError - I can't even see license

2009-02-03 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 2009-02-03 13:39, Amaury Forgeot d'Arc wrote: > Amaury Forgeot d'Arc added the comment: > > Since r56395, ord() and chr() accept and return surrogate pairs even in > narrow builds. > > The goal is to remove most differences between narrow and wide unic

[issue5127] UnicodeEncodeError - I can't even see license

2009-02-03 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: Since r56395, ord() and chr() accept and return surrogate pairs even in narrow builds. The goal is to remove most differences between narrow and wide unicode builds (except for string lengths, indices or slices) To address this problem, I suggest to chan

[issue5127] UnicodeEncodeError - I can't even see license

2009-02-03 Thread Ezio Melotti
Ezio Melotti added the comment: FWIW, on Python3 it seems to work: >>> import unicodedata >>> unicodedata.category("\U0001") 'Lo' >>> unicodedata.category("\U00011000") 'Cn' >>> unicodedata.category(chr(0x1)) 'Lo' >>> unicodedata.category(chr(0x11000)) 'Cn' >>> ord(chr(0x1)), 0x1

[issue5127] UnicodeEncodeError - I can't even see license

2009-02-03 Thread STINNER Victor
STINNER Victor added the comment: I don't understand the behaviour of unichr(): Python 2.7a0 (trunk:68963M, Jan 30 2009, 00:49:28) >>> import unicodedata >>> unicodedata.category(u"\U0001") 'Lo' >>> unicodedata.category(u"\U00011000") 'Cn' >>> unicodedata.category(unichr(0x1)) Traceback

[issue5127] UnicodeEncodeError - I can't even see license

2009-02-02 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: There were non-ascii characters in the Windows license file. This was corrected with r67860. > I believe that chr(0x1) and chr(0x11000) should have the > opposite behavior. This other problem is because on a narrow unicode build, Py_UNICODE_ISPRINT

[issue5127] UnicodeEncodeError - I can't even see license

2009-02-01 Thread Ezio Melotti
Ezio Melotti added the comment: Here (winxpsp2, Py3, cp850-terminal) the license works fine: >>> license Type license() to see the full license text and license() works as well. I get this output for the chr()s: >>> chr(0x1) '\U0001' >>> chr(0x11000) Traceback (most recent call last):

[issue5127] UnicodeEncodeError - I can't even see license

2009-02-01 Thread Venusaur
New submission from Venusaur : >>> license Traceback (most recent call last): File "", line 1, in File "C:\Python30\lib\site.py", line 372, in __repr__ self.__setup() File "C:\Python30\lib\site.py", line 359, in __setup data = fp.read() File "C:\Python30\lib\io.py", line 1724, in