[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

Terry J. Reedy Thu, 08 Sep 2011 11:56:19 -0700

Terry J. Reedy <tjre...@udel.edu> added the comment:

On 9/8/2011 4:32 AM, Ezio Melotti wrote:
> So to summarize a bit, there are different possible level of strictness:
>    1) all the possible encodable values, including the ones>10FFFF;
>    2) values in range 0..10FFFF;
>    3) values in range 0..10FFFF except surrogates (aka scalar values);
>    4) values in range 0..10FFFF except surrogates and noncharacters;
>
> and this is what is currently available in Python:
>    1) not available, probably it will never be;
>    2) available through the 'surrogatepass' error handler;
>    3) default behavior (i.e. with the 'strict' error handler);
>    4) currently not available.
>
> Now, assume that we don't care about option 1 and want to implement the 
> missing option 4 (which I'm still not 100% sure about).  The possible options 
> are:
>    * add a new codec (actually one for each UTF encoding);
>    * add a new error handler that explicitly disallows noncharacters;
>    * change the meaning of 'strict' to match option 4;


If 'strict' meant option 4, then 'scalarpass' could mean option 3. 
'surrogatepass' would then mean 'pass surragates also, in addition to 
non-char scalers'.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue12729>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

Reply via email to