[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

Terry J. Reedy Fri, 02 Sep 2011 12:24:29 -0700

Terry J. Reedy <tjre...@udel.edu> added the comment:

Ezio, that is a lot of nice work to track down those pieces of the standard. I 
think the operative phrase in many of those quotes is 'open interchange'. 
Codecs are also used for private storage. If I use the unassigned or 
private-use code points in a private project, I would use utf-8 to save the 
work between active sessions. That is quite fine under the standard. But I 
should not put files with such codings on the net for 'open interchange'. And 
if I receive them, the one thing I should not do is interpret them as 
meaningful abstract characters.


So the codec should allow for both public and private use. I have the 
impression that is does so now. A Python programmer should know whether the 
codec is being used for private (or local group) files or random stuff from the 
net, and therefore, what the appropriate error handling is. If they do not now, 
the docs could suggest that public text should normally be decoded with 
'strict' or 'replace' and that 'ignore' should normally be reserved for local 
text that is known to intentionally have 'errors'.

I am pretty sure that the intent of prohibiting non-standard interpretation of 
code points as abstract characters is to prevent 'organic' evolution of the 
code point -- abstract character mapping in which anyone (or any company) who 
wants to do so creates a new pairing and promotes its wide recognition around 
the world. Conforming implementations are strict in both what they produce 
(publicly) *and* in what they accept (from public sources). Many now think that 
liberal acceptance of sloppy html promoted sloppy production of html.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue12729>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

Reply via email to