On Sun, 08 May 2005 19:49:42 +0200, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
>John Machin wrote: >> Martin, I can't guess the reason for this last suggestion; why should >> a Windows system use iso-8859-1 instead of cp1252? > >Windows users often think that windows-1252 is the same thing as >iso-8859-1, and then exchange data in windows-1252, but declare them >as iso-8859-1 (in particular, this is common for HTML files). >iso-8859-1 is more portable than windows-1252, so it should be >preferred when the data need to be exchanged across systems. Martin, it seems I'm still a long way short of enlightenment; please bear with me: Terminology disambiguation: what I call "users" wouldn't know what 'cp1252' and 'iso-8859-1' were. They're not expected to know. They just type in whatever characters they can see on their keyboard or find in the charmap utility. It's what I'd call 'admins' and 'developers' who should know better, but often don't. 1. When exchanging data across systems, should not utf-8 be preferred??? 2. If the Windows *users* have been using characters that are in cp1252 but not in iso-8859-1, then attempting to convert to iso-8859-1 will cause an exception. >>> euro_win = chr(128) >>> euro_uc = euro_win.decode('cp1252') >>> euro_uc u'\u20ac' >>> unicodedata.name(euro_uc) 'EURO SIGN' >>> euro_iso = euro_uc.encode('iso-8859-1') Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'latin-1' codec can't encode character u'\u20ac' in position 0: ordinal not in range(256) >>> I find it a bit hard to imagine that the euro sign wouldn't get a fair bit of usage in Swedish data processing even if it's not their own currency. 3. How portable is a character set that doesn't include the euro sign? Regards, John -- http://mail.python.org/mailman/listinfo/python-list