Terry J. Reedy <tjre...@udel.edu> added the comment: On 9/8/2011 4:32 AM, Ezio Melotti wrote: > So to summarize a bit, there are different possible level of strictness: > 1) all the possible encodable values, including the ones>10FFFF; > 2) values in range 0..10FFFF; > 3) values in range 0..10FFFF except surrogates (aka scalar values); > 4) values in range 0..10FFFF except surrogates and noncharacters; > > and this is what is currently available in Python: > 1) not available, probably it will never be; > 2) available through the 'surrogatepass' error handler; > 3) default behavior (i.e. with the 'strict' error handler); > 4) currently not available. > > Now, assume that we don't care about option 1 and want to implement the > missing option 4 (which I'm still not 100% sure about). The possible options > are: > * add a new codec (actually one for each UTF encoding); > * add a new error handler that explicitly disallows noncharacters; > * change the meaning of 'strict' to match option 4;
If 'strict' meant option 4, then 'scalarpass' could mean option 3. 'surrogatepass' would then mean 'pass surragates also, in addition to non-char scalers'. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue12729> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com