Follow-up Comment #41, bug #63074 (group groff): Hi Deri,
At 2024-11-12T13:47:33-0500, Deri James wrote: > Follow-up Comment #40, bug #63074 (group groff): > > [comment #38 comment #38:] >> One could envision three levels of support for encoding arbitrary >> characters. >> >> 1. By Unicode code point. Reusing _groff_'s own syntax for Unicode special >> character escape sequences was irresistibly tempting, so that's what I >> implemented. We have that in Git HEAD. >> 2. By (simple) _groff_ special character escape sequence, like \['o'] (in >> "Cicerón"). We have that in Git HEAD too. >> 3. By composite special character escape sequence, like "\[o aa]", which we >> might also use to write "Cicerón"--"Cicer\[o aa]n". We don't have that. >> It >> proved to be difficult. (The formatter warns if it encounters this syntax >> where it can't handle it.) > > If you implement (3) you realise that searching a document for > "Cicerón" (which was formed using \[o aa]) may not be found. I think that's not correct, except in `output`/`\!` arguments, where you get exactly the "grout" you ask for. Here's the commented out test, preceded by the corresponding input: input='. .ds h Caf\[e aa] Hyphen-Minus and \[rs]\[u2010] \X"ps:exec 5:\\X [/Dest /pdf:bm1 /Title (\*[h]) /Level 1 /OUT pdfmark" \!x X ps:exec 6:\! [/Dest /pdf:bm1 /Title (\*[h]) /Level 1 /OUT pdfmark .device ps:exec 7:device [/Dest /pdf:bm1 /Title (\*[h]) /Level 1 /OUT pdfmark .output x X ps:exec 8:output [/Dest /pdf:bm1 /Title (\*[h]) /Level 1 /OUT pdfmark .' #echo "checking practical bookmarking with device request" >&2 #printf "%s\n" "$output" \ # | grep -Fqx 'x X ps:exec 7:device [/Dest /pdf:bm1 /Title (Caf\[u00E9] Hyphen-Minus and \\[u2010]) /Level 1 /OUT pdfmark' \ # || wail As you can see, I expect `\[e aa]` to be transformed to \[u00E9]. But this doesn't work presently, because the `device` request reads its argument in copy mode, unlike the `\X` escape sequence. The goal is to have `\[u00E9]` show up in the output no matter how it was spelled in the input: `\['e]`, `\[e aa]`, `\[e ']`, `\[u0065_0301]`, or indeed `\[u00E9]`. That should present no complications for searching. (We may want to someday migrate to a different policy for Unicode decomposition--I leave that problem for when it ripens.) > So I prefer not allowing composites, unless you have a zinger argument > for them. My zinger is simply that no user should have to remember this absurdly esoteric detail. Right now, they get a warning if they bump into it. > Yes, both of these work perfectly:- > > printf "Caf\\['e]\n.br\n.output x X ps:exec [/Dest /pdf:bm1 /Title (Eat at > Joe's Caf\\['e].) /Level 1 /OUT pdfmark\n" | test-groff -T pdf | okular - > > printf "Caf\\['e]\n.br\n.device ps:exec [/Dest /pdf:bm1 /Title (Eat at Joe's > Caf\\['e].) /Level 1 /OUT pdfmark\n" | test-groff -T pdf | okular - That was one of my goals! I'm pleased that _something_ worked out after all this struggle. :-O Best, Branden _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?63074> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
signature.asc
Description: PGP signature