On Tue, 4 Sep 2018 13:59:10, Doug Henderson wrote:
My preference is to remove the output fiddling code that Corrina has been working on. It is trying to solve the wrong problem. I think we have gone down a rabbit hole at the wrong end of cat's data flow.
this has nothing to do with "cat". it has to do with the unfounded design decision to use U+2592. Granted at this point we are bikeshedding - but an official standard does exist, namely Unicode, with 2 applicable characters for this use case: 1. U+FFFD: http://unicode.org/charts/nameslist/n_FFF0.html 2. U+25A1: http://unicode.org/charts/nameslist/n_25A0.html
Should any changes to the way a character is displayed be required, it needs to be in the terminal program that display the character, not in cygwin which should pass the character along unmodified.
the "terminal" in this case is either "cygwin" or "xterm" - in both cases code changes have already been made in reponse to this thread, so i dont think your comment here holds weight.
Both cygwin and Debian 9.5 show: $ file alfa.txt alfa.txt: ISO-8859 text When Linux reads the file, it assumes the encoding is UTF-8. When cygwin reads the file, it assume the encoding is CP1252 This command shows the problem $ iconv -f utf8 alfa.txt iconv: alfa.txt:1:0: incomplete character or shift sequence On Linux, this shows a slightly different message, with the same intent. Try using this string: $ printf "\xC3\xAB\353\n" =C3=AB=E2=96=92 to get a better understanding of the problem. It contains two representation of LATIN SMALL LETTER E WITH DIAERESIS, first encoded in UTF-8, then using ISO-8859-1.
now it appears *you* are going down the rabbit hole. both Cygwin and Mintty were in violation on Unicode standard - however this has already been remedied in the code.
There are two different reasons for the MEDIUM SHADE. Here it indicates an invalid UTF-8 character, and the font does not have a glyph for REPLACEMENT CHARACTER. The MEDIUM SHADE is also used in place of an ordinary character without a glyph in the font.
this is flat wrong. U+2592 MEDIUM SHADE is *only* used in cases of invalid UTF-8. In case of missing character - the ".notdef" glyph is used - as has been discussed several times in this thread. This is not an actual character, so i cannot paste it here - but as an example with "DejaVu Sans Mono" the glyph is an empty rectangle. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple