Re: man -Tps wraps lines after one word

Nathan Carruth Mon, 21 Jul 2025 20:30:01 -0700

On Fri, Jul 18, 2025 at 06:06:55PM +0200, Ingo Schwarze wrote:
> Hello Jan,
> 
> Jan Stary wrote on Fri, Jul 18, 2025 at 03:22:44PM +0200:
> 
> > It seems the ps/pdf output wraps a line of text
> > after each word. This happens with every manpage.
> 
> Oops.  Fixed with the commit appended below.
> 
> I was so scrupulous about testing ASCII, UTF-8, and HTML output
> that i totally forgot sufficiently testing PostScript and PDF.
> 
> If anyone has an idea how to regress/ test PostScript or PDF
> output, i'd be interested in hearing about it.  Diffing complete
> output files is not an option, and even diffing *parts* of
> output files (like it is done for HTML) isn't either.
> That would be over-testing because PostScript and PDF
> files contain so many gory details that can change without
> the output becoming wrong, and that are actually likely to
> change as a result of minor code changes, so we would
> constantly get massive churn in the test suite.
> Also, whatever is done needs to work with tools that are
> available in the base system.


I doubt there is such a tool.

Having spent a fair amount of time looking into this problem (in the
context of diffing PDF versions of mathematics papers/interviews), my
take is that such a tool would require running a tree-based diffing
algorithm on the internal structure of the PDFs. Even then, the
complexity of the PDF format makes any general comparison very tricky.

More pragmatically, in my experience diffing PDFs also runs into issues
with the page-based structure of PDF. For example, suppose I have
versions v1 and v2, and v2 adds a line in the middle of p. 1. Then the
last line of v1p1 becomes the first line of v2p2, etc., and (almost)
_every succeeding page_ of the file lists two different lines, one at
the top and one at the bottom. The more that is added, the worse it
gets. The only way I can see around this would be to internally reflow
the body text -- which might require heuristics to strip headers and
footers -- into an unpaginated format before computing the difference.

(After writing this paragraph I remembered you mentioned using pdftotext
so perhaps you already have some method for dealing with headers/footers
and changes in pagination?)

Nathan

> 
> In any case, since PostScript and PDF are not really the focus
> of mandoc(1) development, it is appreciated that some people
> appear to keep an eye on it.  Thanks.  :-)
> 
> Yours,
>   Ingo
> 
> 
> CVSROOT:      /cvs
> Module name:  src
> Changes by:   schwa...@cvs.openbsd.org        2025/07/18 09:46:58
> 
> Modified files:
>       usr.bin/mandoc : term_ps.c 
> 
> Log message:
> Adjust viscol (the distance in basic units from the column offset)
> and minbl (the minimum whitespace in basic units before the next column)
> in ps_advance() and ps_endline() because that is what term.c now expects.
> Regression reported by Jan Stary <hans at stare dot cz> on misc@.
> 
> Also adjust ps_hspan() to the new definition of basic units
> in terminal output.
> 
> 
> Index: term_ps.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/term_ps.c,v
> diff -u -p -r1.57 term_ps.c
> --- term_ps.c 16 Jul 2025 14:23:55 -0000      1.57
> +++ term_ps.c 18 Jul 2025 15:44:36 -0000
> @@ -1207,6 +1207,7 @@ ps_advance(struct termp *p, size_t len)
>       ps_plast(p);
>       ps_pclose(p);
>       p->ps->pscol += len;
> +     p->viscol += len;
>  }
>  
>  static void
> @@ -1230,6 +1231,8 @@ ps_endline(struct termp *p)
>       /* Left-justify. */
>  
>       p->ps->pscol = p->ps->left;
> +     p->viscol = 0;
> +     p->minbl = 0;
>  
>       /* If we haven't printed anything, return. */
>  
> @@ -1307,7 +1310,7 @@ ps_hspan(const struct termp *p, const st
>                * scaling unit so that output is the same regardless
>                * the media.
>                */
> -             r = PNT2AFM(p, su->scale * 72.0 / 240.0);
> +             r = PNT2AFM(p, su->scale * 72.0 / 10.0);
>               break;
>       case SCALE_CM:
>               r = PNT2AFM(p, su->scale * 72.0 / 2.54);
> @@ -1340,8 +1343,7 @@ ps_hspan(const struct termp *p, const st
>               r = su->scale;
>               break;
>       }
> -
> -     return r * 24.0;
> +     return r;
>  }
>  
>  static void
>

Re: man -Tps wraps lines after one word

Reply via email to