[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

Antoine Pitrou Sun, 14 Aug 2011 10:44:55 -0700

Antoine Pitrou <pit...@free.fr> added the comment:

> > The UTF-8 codec described by RFC 2279 didn't say so, so, since our
> > codec was following RFC 2279, it was producing valid UTF-8.  With RFC
> > 3629 a number of things changed in a non-backward compatible way.
> > Therefore we couldn't just change the behavior of the UTF-8 codec nor
> > rename it to something else in Python 2.  We had to wait till Python 3
> > in order to fix it.
> 
> I'm a bit confused on this.  You no longer fix bugs in Python 2?


In general, we try not to introduce changes that have a high probability
of breaking existing code, especially when what is being "fixed" is a
minor issue which almost nobody complains about.

This is even truer for stable branches, and Python 2 is very much a
stable branch now (no more feature releases after 2.7).

> That's why I say that you are of conformance by having encoders and decoders 
> of UTF
> streams tolerate noncharacters.  You are not allowed to call something a UTF 
> and do
> non-UTF things with it, because this in violation of conformance requirement 
> C2.

Perhaps, but it is not Python's fault if the IETF and the Unicode
consortium have disagreed on what UTF-8 should be. I'm not sure what
people called "UTF-8" when support for it was first introduced in
Python, but you can't blame us for maintaining a consistent behaviour
across releases.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue12729>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

Reply via email to