[EMAIL PROTECTED] wrote: > Python seems to be missing a UCS-32 codec, even in wide builds (not > that it the build should matter). > Is there some deep reason or should I just contribute a patch?
The only reason is that nobody has needed one so far, and because it is quite some work to do if done correctly. Why do you need it? > There should be '-le' and '-be' variats, I suppose. Should there be a > variant without explicit endianity, using a BOM to decide (like > 'utf-16')? Right. > And it should combine surrogates into valid characters (on all builds), > like the 'utf-8' codec does, right? Right. Also, it should support the incremental interface (as any multi-byte codec should). If you want it complete, it should also support line-oriented input. Notice that .readline/.readlines is particularly difficult to implement, as you can't rely on the underlying stream's .readline implementation to provide meaningful results. While we are discussing problems: there also is the issue whether .readline/.readlines should take the additional Unicode linebreak characters into account (e.g. U+2028, U+2029), and if so, whether that should be restricted to "universal newlines" mode. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list