On 11/21/23, Oliver Corff <oliver.co...@email.de> wrote: > So the first line effectively was: > > <feff>.ig > > No wonder it did not work. Would it be meaningful to (optionally) tell > groff to jump over or throw away BOMs it encounters at the beginning of > a file? Or should sanity and awareness be left with the astute user?
The problem with this sensible idea is that groff input is ISO 8859-1 (a.k.a. Latin-1) encoding, and FE and FF are both valid Latin-1 characters (albeit ones unlikely to appear as the first two bytes of a Latin-1 document). Giving groff the -k option may act as an ersatz ignore-the-BOM option; this will run the preconv preprocessor, which is BOM-aware, before running groff itself. But if your input is otherwise in Latin-1, this won't work, because the BOM will make preconv decide the input is UTF-8. If your groff input is limited to ASCII, it'll be fine, because in the ASCII range Latin-1 and UTF-8 look identical. (The Chinese characters being only inside .ig blocks, I'm presuming it doesn't matter for your purposes how these are encoded when they hit groff.) If groff itself had a command-line option specifically to tell it to skip a leading BOM, that would still require you to know the BOM was there to know the option was needed, which wouldn't have saved you the hassle of debugging your problem. And once you know the BOM is there, you can create an alias that runs a simple sed (e.g., sed 1s/^\\xFE\\xFF//) before running groff.