Martin von Gagern <martin.vgag...@gmx.net> added the comment: I had my first indication to rather use "macintosh" instead of "mac_roman" from Wikipedia http://en.wikipedia.org/wiki/Mac_OS_Roman which states that the charset part of a MIME content-type specification should be maciontosh. I'm not quoting this as any kind of authority, but rather to point out that it is likely for people to use this.
I did a comparison of http://tools.ietf.org/rfc/rfc1345.txt (RFC) and ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ROMAN.TXT (UNI) using the attached perl script. The results: 3 codepoints unused in RFC but defined in UNI: f0, f6, f7 1 codepoint unused in UNI but defined in RFC: 7f 2 codepoints with slightly different character names, same meaning 9 codepoints with actually different definitions: a5: rfc 2219 BULLET OPERATOR uni 2022 BULLET c4: rfc e023 DUTCH GUILDER SIGN (IBM437 159) uni 0192 LATIN SMALL LETTER F WITH HOOK c6: rfc 0394 GREEK CAPITAL LETTER DELTA uni 2206 INCREMENT c9: rfc 22ef MIDLINE HORIZONTAL ELLIPSIS uni 2026 HORIZONTAL ELLIPSIS d0: rfc 2014 EM DASH uni 2013 EN DASH d1: rfc 2013 EN DASH uni 2014 EM DASH d7: rfc 25c6 BLACK DIAMOND uni 25ca LOZENGE db: rfc 00a4 CURRENCY SIGN uni 20ac EURO SIGN f8: rfc 203e OVERLINE uni 00af MACRON a5 and c6 could be different interpretations of symbols that look pretty much the same. The introduction of the euro sign instead of the generic currency sign seems to be a recent modification documented in UNI. The change of the order of the dashes seems really confusing. Notice also this line in the RFC: &rem source: The Unicode Standard ver1.0, ISBN 0-201-56788-1, Oct 1991 So it looks like the RFC used the unicode definition as its source. What part of it I'm not sure, and where the differences come I'm even less sure. My next steps: * Look for further references, e.g. from apple, and compare them as well * Try some things out on a mac, see how it behaves in real life * Compare all this to the current python implementation * Write a patch to either provide an alias or a new charset "macintosh" Help welcome. ---------- nosy: +gagern Added file: http://bugs.python.org/file12982/compare.pl _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue843590> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com