Re: BOM can ruin your happy groffing experience

Dave Kemper Tue, 21 Nov 2023 08:26:48 -0800

On 11/21/23, Oliver Corff <oliver.co...@email.de> wrote:
> So the first line effectively was:
>
> <feff>.ig
>
> No wonder it did not work. Would it be meaningful to (optionally) tell
> groff to jump over or throw away BOMs it encounters at the beginning of
> a file? Or should sanity and awareness be left with the astute user?


The problem with this sensible idea is that groff input is ISO 8859-1
(a.k.a. Latin-1) encoding, and FE and FF are both valid Latin-1
characters (albeit ones unlikely to appear as the first two bytes of a
Latin-1 document).

Giving groff the -k option may act as an ersatz ignore-the-BOM option;
this will run the preconv preprocessor, which is BOM-aware, before
running groff itself.  But if your input is otherwise in Latin-1, this
won't work, because the BOM will make preconv decide the input is
UTF-8.  If your groff input is limited to ASCII, it'll be fine,
because in the ASCII range Latin-1 and UTF-8 look identical.  (The
Chinese characters being only inside .ig blocks, I'm presuming it
doesn't matter for your purposes how these are encoded when they hit
groff.)

If groff itself had a command-line option specifically to tell it to
skip a leading BOM, that would still require you to know the BOM was
there to know the option was needed, which wouldn't have saved you the
hassle of debugging your problem.  And once you know the BOM is there,
you can create an alias that runs a simple sed (e.g., sed
1s/^\\xFE\\xFF//) before running groff.

Re: BOM can ruin your happy groffing experience

Reply via email to