Update of bug #65601 (group groff): Status: None => Need Info Summary: Bogus 'bogus composite' errors introduced by commit 6008b6b7aa => [troff] bogus 'bogus composite' errors introduced by commit 6008b6b7aa
_______________________________________________________ Follow-up Comment #2: [comment #0 original submission:] > If preconv produces a valid composite character groff should not reject it. Of course if the composite is not available in any available font Unfortunately that's not the way GNU _troff_ works. (Or I'm not understanding the bug report.) The list of composite characters is global. Here's what our Texinfo manual says in Git HEAD. -- Escape sequence: \[base-glyph combining-component ...] ... GNU 'troff' resolves '\[...]' with more than a single component as follows: * Any component that is found in the GGL [groff glyph list --GBR] is converted to the 'uXXXX' form. * Any component 'uXXXX' that is found in the list of decomposable glyphs is decomposed. * The resulting elements are then concatenated with '_' in between, dropping the leading 'u' in all elements but the first. No check for the existence of any component (similar to 'tr' request) is done. Examples: '\[A ho]' 'A' maps to 'u0041', 'ho' maps to 'u02DB', thus the final glyph name would be 'u0041_02DB'. This is not the expected result: the ogonek glyph 'ho' is a spacing ogonek, but for a proper composite a non-spacing ogonek (U+0328) is necessary. Looking into the file 'composite.tmac', one can find '.composite ho u0328', which changes the mapping of 'ho' while a composite glyph name is constructed, causing the final glyph name to be 'u0041_0328'. '\[^E u0301]' '\[^E aa]' '\[E a^ aa]' '\[E ^ ']' '^E' maps to 'u0045_0302', thus the final glyph name is 'u0045_0302_0301' in all forms (assuming proper calls of the 'composite' request). It is not possible to define glyphs with names like 'A ho' within a 'groff' font file. This is not really a limitation; instead, you have to define 'u0041_0328'. ... -- Request: .composite c1 c2 Map ordinary or special character name C1 to C2 when C1 is a combining component in a composite character. See above for examples. This is a strict rewriting of the special character name; no check is performed for the existence of a glyph for either. Typically, 'composite' is used to map a spacing character to a combining one. A set of default mappings for many accents can be found in the file 'composite.tmac', loaded by the default 'troffrc' at startup. You can obtain a report of mappings defined by 'composite' on the standard error stream with the 'pcomposite' request. *Note Debugging::. > Personally I see little value in this error, I do find value in it; in the ChangeLog entry, I provided my rationale. In the commit message I even provided exhibits of cases that should have produced a diagnostic but did not. [troff]: Diagnose bogus composite character escape sequences. That is, when a composite character escape sequence like \[a ~] has a bogus modifier (as opposed to base) character, meaning one that has not been defined as the source _or_ destination of a `composite` request, warn about it. For instance, \[a $] is nonsense, barring a request like `.composite $ \[uFF00]`, which would map `$`, when used as a modifier character in a composite special character escape sequence, to U+FF00, which would be a modifier form of the dollar sign in an alternate universe. ... Input: .nf \[A a~] \[A ~] \[u0041_0301] \[u0041_007E] \" should fail because 007E is explicitly spacing \[u0041_0041] \" same reason, more obviously \[u0041_0301_0301] \" should fail, would have a different meaning \[u0041_007E_0301] \" both problems above groff 1.23.0 and earlier: $ groff -T ps -z EXPERIMENTS/composite_character_construction.groff troff:...:5: warning: special character 'u0041_007E' not defined troff:...:6: warning: special character 'u0041_0041' not defined troff:...:7: warning: special character 'u0041_0301_0301' not defined troff:...:8: warning: special character 'u0041_007E_0301' not defined $ groff -Tutf8 -z EXPERIMENTS/composite_character_construction.groff [no output due to Savannah #65109] Now: $ ./build/test-groff -T ps -z EXPERIMENTS/composite_character_construction.groff troff:...:5: warning: special character 'u0041_007E' not defined troff:...:6: error: cannot format glyph: 'u0041_0041' is not a valid composite character troff:...:7: warning: special character 'u0041_0301_0301' not defined troff:...:8: warning: special character 'u0041_007E_0301' not defined $ ./build/test-groff -T utf8 -z EXPERIMENTS/composite_character_construction.groff troff:...:6: error: cannot format glyph: 'u0041_0041' is not a valid composite character > the existing error reporting of a special character not defined is more helpful since if you find a font which contains the correct glyph, the error will be gone. Is this true in full generality? Does it also apply to output devices that don't even have a "charset" section in their fonts because they're "unicode" [sic] devices? groff_font(5): unicode The output device supports the complete Unicode repertoire. This directive is useful only for devices which produce character entities instead of glyphs. If unicode is present, no charset section is required in the font description files since the Unicode handling built into groff is used. However, if there are entries in a font description file’s charset section, they either override the default mappings for those particular characters or add new mappings (normally for composite characters). The utf8, html, and xhtml output devices use this directive. (I feel that that's a badly named directive. As I understand it, it, it more precisely means that a different glyph resolution mechanism is used--or none at all, instead assuming that the device is happy to attempt to combine any sequence of Unicode code points as a grapheme cluster.) > I'm sure there are users capable of creating a font with all sorts of weird composite glyphs, why should we police what they can do? Because we have no mechanism for defining font-specific composite character *components*. (Meaning: "foo" in `\[a foo]`; contrast with the composed composite characters contemplated by the second paragraph of the "unicode" directive description quoted above.) Maybe we should, but that in turn would mean having font-specific macro files that users' documents would need to load. And we'd probably need a tool to generate them. Might be better/more scalable to ask authors of such documents issue the `composite` requests itself. We can add commonly used ones that we are presently missing to "composite.tmac". My anticipation of this problem is why I added a (rather, stopped discouraging use of an existing) mechanism to delete composite character mappings and a new request for reporting the ones the formatter knows about. Or people can bypass this escape sequence syntax entirely and spell their grapheme clusters in Unicode directly as is already supported. Our Texinfo manual again: * A glyph representing more than a single input character is named 'u' COMPONENT1 '_' COMPONENT2 '_' COMPONENT3 ... Example: 'u0045_0302_0301'. There may be an opportunity for some terminological revision here. This section of the manual is one of those I haven't finished my first revision pass on yet. I still have things to learn. Maybe you can shed some light where things are dark for me. commit 2c76a931b81b1e22dd419c7027d3517325c23193 Author: G. Branden Robinson <g.branden.robin...@gmail.com> Date: Wed Jan 17 14:02:28 2024 -0600 [troff]: Fix Savannah #64937 (del composite char). * src/roff/troff/input.cpp (map_composite_character): Stop throwing diagnostic message when `composite` request invoked with only one argument. This has long worked just fine to delete a composite character mapping. That is something a (rare) user might conceivably want to do. Fixes <https://savannah.gnu.org/bugs/?64937>. commit e958bb4fc65326dd9cd0d775e96aff15e944795e Author: G. Branden Robinson <g.branden.robin...@gmail.com> Date: Wed Jan 17 13:49:40 2024 -0600 [troff]: Implement new `pcomposite` request. * src/roff/troff/input.cpp (report_composite_characters): Add. (init_input_requests): Wire up `pcomposite` request name to `report_composite_characters()`. * doc/groff.texi (Colors, Debugging): * man/groff.7.man (Request short reference, Debugging): * man/groff_diff.7.man (New requests, Debugging): * NEWS: Document it. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?65601> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/