Hi all, I investigated the fmt code to find out where it breaks for me. I found this thread when I checked whether my problem was already discussed.
On Sat, Jan 09, 2016 at 02:44:09AM +0100, Thomas Klausner wrote: > On Fri, Jan 08, 2016 at 04:21:13PM -0800, Tom Spindler (moof) wrote: [...] > > FWIW, I'm fine with replacing fmt with a newer version, but I'd like to > > have a better idea of what it fixes. > > For me, it randomly breaks non-ASCII characters. I haven't really > understood what it does; I think it strips out parts of the code > points if it doesn't understand them. The point fmt breaks for me is: It tries to skip over non-printable characters using this sequence if(!(isprint(c) || c == '\t' || c >= 160)) { c = getc(fi); continue; } Now, ß and ÄÖÜ and some Greek letters - let me randomly insert ασδφ here - are represented in UTF-8 by hex 0xCY 0xZZ for 0x80 <= 0xZZ < 0xa0, so they're skipped over and lost; the CY combines then with some innocent follow-up to produce something unspeakable. Most of my needs are solved by a version with c >= 128 in the above, maybe depending on strcmp(getenv("LC_CTYPE") ,"utf-8")). This is a horrible hack and overestimates the screen space needed, but that's good enough for me now. Regards, -is