Hi Alexis, Thanks for working on this use case!
At 2023-06-08T13:40:36+0200, Alexis wrote: > When using the generated Computer Modern Unicode fonts with pdfroff, > grops complains about invalid input characters in the pfb files, e.g.: > > % pdfroff -t -ms -mcmu doc/ms.ms > doc/ms.pdf > grops:$GROFF_FONT_PATH/devps/../cm-unicode-0.7.0/cmunbi.pfb > (doc/ms.ms):3047: invalid input character code 3 > > What could be the cause for that? To all appearances, it is a diagnostic from one of the following two places in the source. src/devices/grops/psrm.cpp:429: error("invalid input character code %1", int(c)); src/libs/libgroff/font.cpp:116: error("invalid input character code %1", int(c)); The second diagnostic tests for valid input characters to the formatter. These are documented (at least as of groff 1.23.0 RC yadda) in our Texinfo manual and in the groff(7) page. Invalid input characters are subset of control characters (from the sets "C0 Controls" and "C1 Controls" as Unicode describes them). When troff encounters one in an identifier, it produces a warning in category "input" (see section "Warnings" in troff(1)). They are removed during interpretation: an identifier "foo", followed by an invalid character and then "bar", is processed as "foobar". On a machine using the ISO 646, 8859, or 10646 character encodings, invalid input characters are 0x00, 0x08, 0x0B, 0x0D-0x1F, and 0x80-0x9F. On an EBCDIC host, they are 0x00-0x01, 0x08, 0x09, 0x0B, 0x0D-0x14, 0x17-0x1F, and 0x30-0x3F. Some of these code points are used by troff internally, making it non-trivial to extend the program to accept UTF-8 or other encodings that use characters from these ranges. That the diagnostic above is prefixed neither with "error" nor "warning" suggests to me that it is groff 1.22.4 output. It also looks like I might want to recast some of these diagnostics slightly to distinguish them. Character code 3 _is_ valid GNU troff input, so by elimination I begin to suspect the source of the diagnostic is the first file, psrm.cpp. "psrm" is short for PostScript Resource Manager. It's responsible for loading fonts and whatever else can be embedded in PostScript. It has its own table of valid input characters, and code 3 is _not_ valid there. https://git.savannah.gnu.org/cgit/groff.git/tree/src/devices/grops/psrm.cpp?h=1.23.0.rc4#n38 > Does someone know of afix or workaround? It sounds like we need a PostScript expert to tell us if grops's table is accurate. If it is, then it sounds like something is producing a corrupt PFB file. A workaround might be to convert it to PFA instead. groff's own pfbtops(1) command can do this. Ghostscript provides pfbtopfa(1) too. > What follows are some thoughts on adding support for font aliases to > groff. > > With groff 1.22.4 doing less validation of font files it was possible > to simply create symlink to a font file and that symlink would serve > as a font alias. > > Since groff 1.23.0 does more validation of the font files than groff > 1.22.4 each font alias needs to be a separate file on disk, although > only the filename and value for the name directive differ. > > Is it possible and feasible to change the format of the name directive > so that it allows for several names/aliases? > > This would allow a single font file to be used for several font > aliases. > > Imagine a font file cmunrm containing the following name directive: > > name cmunrm CMUSerifR CMUSerifRoman > > and CMUSerifR and CMUSerifRoman being symlinks to cmunrm. > > This would make it easy to add a new font alias to a font and show > the relation of fonts and font aliases on the file system too. It is probably feasible, but... > So far I've only looked into supporting multiple names and aliases > during font file parsing in function font::load from > src/libs/libgroff/font.cpp:779 but know too little about groff's font > loading mechanism in general. Any pointers are greatly appreciated. > > If folks think that this might be a useful change I'd be happy to > learn what other code parts might need changing and possible have > a first go at it. > > What are your thoughts? I think there is already a mechanism for this. When I learned of it, I found it a surprisingly old one, dating back to what we might call "late Kernighan troff". Also, there is already some precedent for shipping relatively small supporting macro files as companions to font descriptions. This precedent is "ec.tmac", which has been around for many years as support for the EC fonts for TeX and our grodvi(1) output driver. So if a font packager for groff were willing to maintain and supply a macro file as well, they could alias the font when it is mounted. Quoting our Texinfo manual from the groff Git master branch... -- Request: .fp pos id [font-description-file-name] -- Register: \n[.f] -- Register: \n[.fp] Mount a font under the name ID at mounting position POS, a non-negative integer. When the formatter starts up, it reads the output device's description to mount an initial set of faces, and selects font position 1. Position 0 is unused by default. Unless the FONT-DESCRIPTION-FILE-NAME argument is given, ID should be the name of a font description file stored in a directory corresponding to the selected output device. GNU 'troff' does not traverse directories to locate the font description file. The optional third argument enables font names to be aliased, which can be necessary in compatibility mode since AT&T 'troff' syntax affords no means of identifying fonts with names longer than two characters, like 'TBI' or 'ZCMI', in a font selection escape sequence. *Note Compatibility Mode::. You can also alias fonts on mounting for convenience or abstraction. (See below regarding the '.fp' register.) .fp \n[.fp] SC ZCMI Send a \f(SChand-written\fP thank-you note. .fp \n[.fp] Emph TI .fp \n[.fp] Strong TB Are \f[Emph]these names\f[] \f[Strong]comfortable\f[]? 'DESC', 'P', and non-negative integers are not usable as font identifiers. The position of the currently selected font (or abstract style) is available in the read-only register '.f'. It is associated with the environment (*note Environments::). You can copy the value of '.f' to another register to save it for later use. .nr saved-font \n[.f] ... text involving many font changes ... .ft \n[saved-font] The index of the next (non-zero) free font position is available in the read-only register '.fp'. Fonts not listed in the 'DESC' file are automatically mounted at position '\n[.fp]' when selected with the 'ft' request or '\f' escape sequence. When mounting a font at a position explicitly with the 'fp' request, this same practice should be followed, although GNU 'troff' does not enforce this strictly. Dave Kemper and I mused about these issues fairly extensively in <https://savannah.gnu.org/bugs/?61423>. Does this show the way to a better solution? Is there anything unclear above that I might make more lucid? Regards, Branden
signature.asc
Description: PGP signature