Thank you for your advice on setting my locale to en_US.UTF-8.
Unfortunately, Cygwin still seems to have trouble displaying some
three-byte UTF-8 encoded characters correctly. For example, see the
following snippet from a "sed" file. This file attempts to convert
XML-encoded filenames to UTF-8. As you can see, it converts one- and
two-byte encodings correctly, but fails on some three-byte encodings
(the en dash, the em dash, and the ellipsis, all of which are displayed
as a filled-in rectangle):
# Match longest strings first
# Three-byte encodings:
# En dash
s/%[Ee]2%80%93/–/g
# Em dash
s/%[Ee]2%80%94/—/g
# Horizontal ellipsis
s/%[Ee]2%80%[Aa]6/…/g
# Less-than-or-equal sign
s/%[Ee]2%89%[Aa]4/≤/g
# Euro symbol
s/%[Ee]2%82%[Aa][Cc]/€/g
# Two-byte encodings:
# Non-break space
#s/%[Cc]2%[Aa]0/⎵/g
# Lowercase a with acute accent
s/%[Cc]3%[Aa]1/á/g
# Lowercase a with umlaut (a.k.a. diaeresis)
s/%[Cc]3%[Aa]4/ä/g
# Lowercase e with acute accent
s/%[Cc]3%[Aa]9/é/g
# Lowercase i with acute accent
s/%[Cc]3%[Aa]D/í/g
# Lowercase o with acute accent
s/%[Cc]3%[Bb]3/ó/g
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple