On 2/5/24, hoh...@posteo.de <hoh...@posteo.de> wrote: > On Tue, 9 Jan 2024 01:13:45 -0600 > Dave Kemper <saint.s...@gmail.com> wrote: > >> In the message to which I was replying, you were speaking of the >> sequence of bytes that were part of the input to gpic; in this realm, >> ECMA-48 is irrelevant. And in any case, the 0x84 byte in question is >> part of the UTF-8 encoding of Unicode character U+00C4 LATIN CAPITAL >> LETTER A WITH DIAERESIS; if it's being interpreted by a terminal >> somewhere as ECMA-48, something is going wrong. >> >> What seems to be going wrong in this instance is that you're passing >> UTF-8 directly to gpic without first running it through preconv or >> iconv, resulting in a byte sequence gpic doesn't recognize. You >> haven't said whether you've tried converting the input before sending >> it to gpic, or why you're avoiding preconv. > > I quote myself: > "The character emerges from a input file name. So it is missed by > preconv somewhere, ..."
Since you haven't said what your pipeline is, I can't debug what preconv is missing or why. But in general if you're doing something like: someprog | gpic where "someprog" is outputting UTF-8, then you should change the pipeline to: someprog | preconv -eutf8 | gpic Like all groff tools, gpic will not recognize UTF-8 input. The encoding has to be converted before gpic sees it. > You completely miss the point of the utf8 sequence "ä" passes while > "Ä" issues. I didn't miss this. Lennart explained this in his December 28 reply in this thread, and I reiterated it in my December 29 reply, and again in my January 2 reply. In short: UTF-8 "ä" in a Latin-1 context is interpreted as two Latin-1 characters whereas UTF-8 "Ä" in a Latin-1 context is one Latin-1 character and one invalid (to groff tools) control character.