On 2009-04-22 22:06, Walter Dörwald wrote: > Martin v. Löwis wrote: >>> "correct" -> "corrected" >> Thanks, fixed. >> >>>> To convert non-decodable bytes, a new error handler "python-escape" is >>>> introduced, which decodes non-decodable bytes using into a private-use >>>> character U+F01xx, which is believed to not conflict with private-use >>>> characters that currently exist in Python codecs. >>> Would this mean that real private use characters in the file name would >>> raise an exception? How? The UTF-8 decoder doesn't pass those bytes to >>> any error handler. >> The python-escape codec is only used/meaningful if the env encoding >> is not UTF-8. For any other encoding, it is assumed that no character >> actually maps to the private-use characters. > > Which should be true for any encoding from the pre-unicode era, but not > for UTF-16/32 and variants.
Actually it's not even true for the pre-Unicode codecs. It was and is common for Asian companies to use company specific symbols in private areas or extended versions of CJK character sets. Microsoft even published an editor for Asian users create their own glyphs as needed: http://msdn.microsoft.com/en-us/library/cc194861.aspx Here's an overview for some US companies using such extensions: http://scripts.sil.org/cms/SCRIPTs/page.php?site_id=nrsi&item_id=VendorUseOfPUA (it's no surprise that most of these actually defined their own charsets) SIL even started a registry for the private use areas (PUAs): http://scripts.sil.org/cms/SCRIPTs/page.php?site_id=nrsi&cat_id=UnicodePUA This is their current list of assignments: http://scripts.sil.org/cms/SCRIPTs/page.php?site_id=nrsi&item_id=SILPUAassignments and here's how to register: http://scripts.sil.org/cms/SCRIPTs/page.php?site_id=nrsi&cat_id=UnicodePUA#404a261e -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 22 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ -- http://mail.python.org/mailman/listinfo/python-list