Re: uppercase german umlaut

Dave Kemper Mon, 05 Feb 2024 22:23:47 -0800

On 2/5/24, hoh...@posteo.de <hoh...@posteo.de> wrote:
> On Tue, 9 Jan 2024 01:13:45 -0600
> Dave Kemper <saint.s...@gmail.com> wrote:
>
>> In the message to which I was replying, you were speaking of the
>> sequence of bytes that were part of the input to gpic; in this realm,
>> ECMA-48 is irrelevant.  And in any case, the 0x84 byte in question is
>> part of the UTF-8 encoding of Unicode character U+00C4 LATIN CAPITAL
>> LETTER A WITH DIAERESIS; if it's being interpreted by a terminal
>> somewhere as ECMA-48, something is going wrong.
>>
>> What seems to be going wrong in this instance is that you're passing
>> UTF-8 directly to gpic without first running it through preconv or
>> iconv, resulting in a byte sequence gpic doesn't recognize.  You
>> haven't said whether you've tried converting the input before sending
>> it to gpic, or why you're avoiding preconv.
>
> I quote myself:
> "The character emerges from a input file name. So it is missed by
> preconv somewhere, ..."


Since you haven't said what your pipeline is, I can't debug what
preconv is missing or why.  But in general if you're doing something
like:

someprog | gpic

where "someprog" is outputting UTF-8, then you should change the pipeline to:

someprog | preconv -eutf8 | gpic

Like all groff tools, gpic will not recognize UTF-8 input.  The
encoding has to be converted before gpic sees it.

> You completely miss the point of the utf8 sequence "ä" passes while
> "Ä" issues.

I didn't miss this.  Lennart explained this in his December 28 reply
in this thread, and I reiterated it in my December 29 reply, and again
in my January 2 reply.  In short: UTF-8 "ä" in a Latin-1 context is
interpreted as two Latin-1 characters whereas UTF-8 "Ä" in a Latin-1
context is one Latin-1 character and one invalid (to groff tools)
control character.

Re: uppercase german umlaut

Reply via email to