Marc-Andre Lemburg <[EMAIL PROTECTED]> added the comment: While it may be desirable to to have repr(unicode) return a non-ASCII string, the suggested approach is not suitable to solve the problem.
repr() is usually used in logging and applications/users/tools don't expect to suddenly find non-ASCII or even mixed encodings in a log file. If you do want to have this more flexible, then make the encoding used by unicode_repr() adjustable, turn the existing code into a codec (e.g. "unicode-repr") and leave it setup as default. Users who wish to see non-ASCII repr(unicode) data can then adjust the used encoding to their liking. This is both more flexible and backwards compatible with 2.x. Also note that the separation of the Unicode database from the interpreter core was done to keep the interpreter footprint manageable. It's not a good idea to just dump the complete table set into unicodeobject.c via an #include. If you need to reference APIs from modules in C, the usual approach is to create a PyCObject which is then exported by the module (see e.g. the datetime module) and imported by code needing it. BTW: "printable" is not a defined term in Unicode. What is or is not printable really depends on the use case, e.g. there are quite a few code points in Unicode that don't result in any glyph being "printed" to the screen. A Unicode string could then look as if it had fewer code points than it actually does - which is not really what you want when debugging code or sifting through log files. ---------- nosy: +lemburg __________________________________ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2630> __________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com