Follow-up Comment #30, bug #64484 (group groff):

At 2024-11-16T12:22:25-0500, Deri James wrote:
> Follow-up Comment #29, bug #64484 (group groff):
>
> Branden leans towards rejecting things even though the users intention
> is clear. For groff there are no spaces, only horizontal movements,

I reject this claim as false.

It is rebuttable in at least 3 respects:

1.  the formatter's input language;
2.  the formatter's internal representation form ("nodes"); and
3.  the formatter's output page description language.

Case one
--------

I've said this before but I guess it bears reiterating:

In groff (and AT&T troff), spaces are discardable; horizontal motions
are not.

Further, spaces and horizontal motions are distinguishable by the
different representations they have in input.

A space on a text line interpolates as "ordinary" space, which is one
word space in width (and is configurable both by a font description file
and the `ss` request) and may be adjustable.

Then we have these:

     \space  Move right one inter‐word space.
     \~      Insert an unbreakable, adjustable space.
     \0      Move right by the width of a numeral in the current font.
     \|      Move one‐sixth em to the right on typesetters.
     \^      Move one‐twelfth em to the right on typesetters.
     \h'N'   Horizontally move the drawing position by N ems (or
             specified units); | may be used.  Positive motion is
             rightward.

Case two
--------

src/roff/troff/node.h:179:class space_node : public node {
src/roff/troff/node.h:221:class word_space_node : public space_node {
src/roff/troff/node.h:245:class unbreakable_space_node : public
word_space_node {
src/roff/troff/node.h:327:class hmotion_node : public node {
src/roff/troff/node.h:363:class space_char_hmotion_node : public hmotion_node
{

Case three
----------

In the output language, 'w' advises the reader and/or output driver of a
word boundary.  This can be useful because a horizontal motion alone
does not suffice to identify these.  Kerning, for instance, can confuse
the issue.

> so should the bookmark "Chapter 1" be rejected because it does contain
> a horizontal movement!

I'd warn about it, at least.

> Elsewhere, in terminal output, space and \0 both result in a one
> character cell space.

As you know well, there are output devices that are not terminals.

> I am sure the user would expect the same when outputting text to a
> bookmark.

Users expect their desires to be perfectly enacted no matter what the
quality of the input they give the system.  There are limits to how well
we can accommodate such expectations, particularly when two different
users have differerent, but defensible, ones from the same inputs.

>
> Let's look at an example:-
>
> .ds dj "Ph: 01632\0\&444666
> \*[dj]
> \X'ps:exec [/Dest /pdf:bm1 /Title (\*[dj]) /Level 1 /OUT pdfmark'
> .\" .device ps:exec [/Dest /pdf:bm1 /Title (\*[dj]) /Level 1 /OUT pdfmark
> .\" .output x X ps:exec [/Dest /pdf:bm1 /Title (\*[dj]) /Level 1 /OUT pdfmark
>
> Using HEAD & .device
>
> tPh:
> wh2500
> t01632
> h5000
> t444666
> wh2500
> x X ps:exec [/Dest /pdf:bm1 /Title (Ph: 016320\&444666) /Level 1 /OUT pdfmark
>
> 1.23.0 & .device
>
> tPh:
> wh2500
> t01632
> h5000
> t444666
> wh2500
> V12000
> H150340
> x X ps:exec [/Dest /pdf:bm1 /Title (Ph: 01632\0\&444666) /Level 1 /OUT
> pdfmark
>
> Using HEAD & \X''
>
> troff:0.trf:3: warning: a horizontal motion is not encodable in
> device-independent output
>
> tPh:
> wh2500
> t01632
> h5000
> t444666
> wh2500
> x X ps:exec [/Dest /pdf:bm1 /Title (Ph: 01632444666) /Level 1 /OUT pdfmark
>
> Using 1.23.0 & \X''
>
> troff:0.trf:3: warning: a horizontal motion is not encodable in
> device-independent output
>
> tPh:
> wh2500
> t01632
> h5000
> t444666
> wh2500
> V12000
> H150340
> x X ps:exec [/Dest /pdf:bm1 /Title (Ph: 01632444666) /Level 1 /OUT pdfmark
>
> Using HEAD & .output
>
> x X ps:exec [/Dest /pdf:bm1 /Title (Ph: 01632\0\&444666) /Level 1 /OUT
> pdfmark
>
> The only one which "works" is the final one, i.e. produces what the
> author would reasonably expect - a space within the phone number - for
> both the text and the bookmark.

It looks to me like you've added a feature (I assume this is gropdf) to
interpret a bespoke selection of *roff input escape sequences within
certain device extension commands tagged "ps:exec".

Where is this behavior documented?  Should *roff users in general expect
output drivers to implement *roff language interpreters, in whole or
part?

> Worryingly .device in HEAD manages to convert \0 to just "0"
> which does seem wrong.

Yes, that does seem odd and is worth investigating.  Probably the dummy
character should be silently discarded too.  And thus likely some other
things, like `\)`.  Argh.  This is the "switch out of copy mode back
into interpretation mode, or some kind of 'mode 3'" death march you
counseled me not to waste my time with, characterizing it as a
self-imposed goal that would only delay the release.  I believe the
implication was that no one else would care about it.

Well, on the bright(?) side, the release might be delayed for a while
anyway, given that I have some documentation to write and, in your
assessment, a low order of intellect with which to compose it.

https://lists.gnu.org/archive/html/groff/2024-11/msg00131.html

> There have always been differences between \X and .device but after
> Branden's extensive changes in this area I'm not sure our
> documentation has caught up, since it appears to now say there is no
> difference,

Where does it say that?

> but the above shows there is.

So does a unit test:

https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/groff/tests/device-control-special-character-handling.sh?id=f107252c3c0d3b70236e0db5a56238a406231bf5#n180

and a diagnostic message:

https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/input.cpp?id=f107252c3c0d3b70236e0db5a56238a406231bf5#n6046

> commit 2548c4659c appears to be the culprit, also affects -T html.

How does grohtml interpret the following?

> x X ps:exec [/Dest /pdf:bm1 /Title (Ph: 01632\0\&444666) /Level 1 /OUT

Are there tags it _does_ recognize where it interprets any of:

     \space
     \~
     \0
     \|
     \^
     \h'N'

?



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?64484>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

Attachment: signature.asc
Description: PGP signature

Reply via email to