Re: GNU troff output and font metrics demystified (was: Greek letters not slanted in -Tps eqn output)

2022-08-09 Thread joerg van den hoff

second time with complete list of addressees ...

hi branden,

thank you for this background info. really appreciated. I was aware of part of 
it
but definitely not all of it.

out of curiosity: if I get this right, presence and usage of the t-command
thus excludes any application of microtypography tweaks like those heirloom
troff is capable of, correct? but groff _could_ do it by avoiding the t-command
(which it seems to be able to do if I understand you correctly)?

but regarding my initial problem of erroneous rendering of something like

.EQ
alpha beta gamma delta sigma rho 1 over 2
.EN

I continue to _not_ see how and why exactly the 1/2 gets misformatted and
why the digits are increasingly shifted off to the right of the fraction bar
when more greek letters are put in front (and SS is not found and silently 
replaced
by S etc.)

best,
joerg


On 08.08.22 22:54, G. Branden Robinson wrote:

[looping in main groff list since I went into didact mode]

Hi Joerg,

At 2022-08-08T21:32:24+0200, joerg van den hoff wrote:

I think I nearly get it now regarding font selection. but regarding
font metrics: my rudimentary understanding was/is that each glyph
essentially gets assigned a rectangle ("bounding box" of the glyph)
and that troff just arranges those boxes next to each other without
gap or overlap, usually?


A "bounding box" for a glyph is a useful concept to have in mind when
designing or producing metrics for a font, but I think it has less
utility when attempting to reason about what troff actually _does_.

If you review the groff_out(5) man page or "gtroff Output" node of our
Texinfo manual, you can get some background on this subject.  (Some of
the writing is subpar IMO--it's an area I have not yet revised to my
satisfaction.)

Consider a simple example.

$ echo foo | groff -Z
x T ps  # output is for 'ps' (PostScript) device
x res 72000 1 1 # set up resolution & horiz., vert. motion quanta
x init  # begin document
p1  # set page number
x font 5 TR # mount Times Roman font at position 5
f5  # select font position 5 for printing
s1  # set type size in basic units (= 10 points)
V12000  # move vertically to y=12,000 units
H72000  # move horizontally to x=72,000 units
md  # set default stroke color
DFd # set default fill color
tfoo# output letters 'f', 'o', 'o', advancing each time
n12000 0# declare presence of line break (documentary)
x trailer   # end document
V792000 # more vertically to page bottom (to cause an ejection)
x stop  # end of troff output

The 't' command is where the real work of writing the glyphs is taking
place.  And as a matter of fact, this command did not exist in AT&T
device-independent troff--it's a groff extension.  In an early version
of AT&T troff, something like the following might have been produced.[1]

cf  # format glyph 'f'
h3330   # move right 3,3000 units
co  # format glyph 'o'
h5000   # move right 5,000 units
co  # format glyph 'o'
h5000   # move right 5,000 units

Here we see that each glyph is formatted and then the drawing position
moved by its width according to font description file, and scaled for
the type size of 10 points.  If we look at the description of the Times
Roman font, go down to the "charset" section which contains the metrics,
and consult the ones for "f" and "o", you will see that the first values
declared are "333" and "500", respectively.  That's at a type size of 1
point, and we're at 10 points, hence the multiplication.

In truth, AT&T's numbers weren't as large because the device resolutions
were not as high.  You can read more about this in groff_font(5), which
I _have_ revised for clarity, so if anything there is vague, that's a
bug in my writing (or my understanding)--please let me know.

But you can begin to see why AT&T stopped doing it this way.  In a
typical document, command pairs like this are going to be the bulk of
troff output.  The short version of a longer story told in CSTR #97 is
that Kernighan introduced an optimization, one which was inapplicable to
glyphs requiring motions expressible with 3 or more digits.  So James
Clark (as far as I know) came up with the 't' command, which relied upon
the output driver (like grops(1)) to look up the glyph widths and scale
them to the type size for itself.  As a bonus, it made the formatted
output text much easier to locate in GNU troff output.

The work of rendering a glyph is delegated to the output driver.  We ask
for a glyph from a certain typeface of a specified color and size, and
the output driver does what it is necessary; it may further delegate
some tasks to a PostScript or PDF interpreter.

You'll note that nowhere in this presentation did the notion of a
"bounding box" come up.  A bounding box is a sort of fiction that helps
us to achieve agreeable placement of drawing elements when laying out a
page.  All GNU troff output knows t

Re: [groff] 09/10: groff(1): Fix error in example.

2022-08-09 Thread G. Branden Robinson
Hi Dave,

At 2022-08-05T18:10:45-0500, Dave Kemper wrote:
> > --- a/src/roff/groff/groff.1.man
> > +++ b/src/roff/groff/groff.1.man
> > @@ -1801,7 +1801,7 @@ constructing a pipeline to page the output.
> >  .RS
> >  .P
> >  .EX
> > -groff \-t \-man /usr/share/man/man1/groff.1.man | less \-R
> > +groff \-t \-man -Tutf8 /usr/share/man/man1/groff.1.man | less \-R
> 
> The trivial problem with this change is that the \ in front of the new
> -T flag is missing.

Yes, indeed!  I'll fix that.

> The slightly larger problem is that it hard-codes into the man page
> text an assumption about the user's encoding environment.  Were this
> difficult to avoid, that'd be one thing, but all that would have
> change is the "groff" command to "nroff," since nroff.sh is pretty
> robust at figuring out the user's encoding.  And nroff has already
> been mentioned on this page a few times, so it wouldn't be coming out
> of left field at this point.

Yes, but I think you're missing the context.  This is an _example_.

.\" 
.SH Examples
.\" 
.
.I roff
systems are best known for formatting man pages.
.
Once a
.MR man 1
librarian program has located a man page,
it may execute a
.I groff
command much like the following,
constructing a pipeline to page the output.
.
.
.RS
.P
.EX
groff \-t \-man \-Tutf8 /usr/share/man/man1/groff.1.man | less \-R
.EE
.RE

I think the section title and phrases like "much like the following" cue
the reader to not slavishly copy and paste without consideration.

Similarly, I avoided demonstrating nroff because this is the groff(1)
page, and that is the subject of discussion.  Examples of nroff usage
are better placed in its own page.  Beyond that, it is important to
reinforce the fact that the nroff command does nothing that can't be
accomplished with the groff command, in contrast to (some? all?) other
troff implementations.

I added the foregoing example because unfortunately it seems to be a
mystery to many people how to get groff to directly render a man page,
and I wanted to call attention to the method.  This is important
background for addressing the religious fervor with which people oppose
grotty(1)'s emission of SGR escape sequences.[1]  If your terminal
(emulator) interprets them without the pager, but using the pager messes
up rendering, then the fault is with the pager, not grotty.

Regards,
Branden

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=312935


signature.asc
Description: PGP signature


Re: Greek letters not slanted in -Tps eqn output

2022-08-09 Thread G. Branden Robinson
[looping in groff@ again due to a shift in discussion focus to
development]

At 2022-08-09T19:14:25+0200, joerg van den hoff wrote:
> On 09.08.22 17:05, Deri wrote:
> > groff and grops can find the SS file, which gives the widths of the
> > SS glyphs, but grops cannot find the Symbolsl.pfa file because it is
> > not listed in the download file it finds.
> 
> question: is there a deeper reason, why grops does not traverse all
> known font locations/dirs and scan all found `download' files until,
> hopefully, a hit is found? I mean except "someone would need to
> volunteer to implement it" :).

That is sadly pretty much it.  The output drivers rely on functions
implemented in libgroff to open files with cognizance of the font search
path.  These functions were written in the expectation that when
searching for one of these files, it would contain all the information
you needed to satisfy the demand.  This works great for "DESC" and files
like "TR".  You either find one that will tell you everything you need
to know, or you won't.

"download" is different.  It's not insane to keep traversing the font
search path even after finding one, because having done so is not a
guarantee that the file will have what you need.

It is possible for device or font description files to be incomplete; if
they are missing essential information, they will fail validation and
the output driver will issue diagnostics.  groff Git is much more
fastidious about such validation than 1.22.4 was.

Are you not getting a diagnostic message from grops(1) when it can't
find "Symbolsl.pfa"?  Are you running groff 1.22.4 or Git?  If the
latter then this is something I want to fix.  Even if I or someone else
implements further searching, we'll need that diagnostic for when all
download files are exhausted without a match.

> I have now read groff_font, which I think is quite clear and finally
> found the statement regarding "first file found is used". so it *is*
> spelled out but it still is easily overlooked (as I am proof of...).

A failure to emit a diagnostic message when a required resource is
unavailable also doesn't help people to understand the situation.  :(

> I also would find it more "natural" if grops/gropdf where doing that.

I agree, and the same thing occurred to me when reviewing the code.

> > Contrary to what it says in the gropdf man page (which I cobbled
> > from the grops man page), gropdf does do what you expect. It builds
> > a map from all download files found. I, too, was not aware that
> > grops didn't.
> 
> oh, is that true!? then I can revert the "merge" of my private
> `download' with the default one for gropdf :). and the manpage might
> be adjusted to explicitly emphasize that gropdf behaves that way,
> possibly?

Please do test that, Joerg!  :D

Yes, updating the gropdf man page sounds appropriate.  Deri, would
prefer to handle this or would you like me to?

> well, if the (newer) gropdf does that, maybe it should be "backported"
> to grops :).

A simple matter of rewriting Perl code in C++.  ;-)

> > > if yes, I can understand that after the `alpha beta gamma... `
> > > sequence groff/eqn presume a different position of last greek
> > > letter than actually is going to be true downstream when the ps
> > > oder pdf is generated. and so groff/eqn would position whatever
> > > comes next on a wrong horizontal position relative to the greek
> > > letters.
> > 
> > After the greek characters comes the horizontal line between 1/2. If
> > the move to the start of the line before drawing is a relative
> > movement from the end of greek characters then it will be in the
> > wrong position.
> 
> ok, in this case, yes. I would have expected the relative movement
> being relative to the fraction bar itself, but that's obviously not
> what eqn does.

This part I cannot speak to, as I have only barely begun coping with GNU
eqn's source code.  If someone else knows, I hope they will clarify.

Regards,
Branden


signature.asc
Description: PGP signature


Re: [groff] 09/10: groff(1): Fix error in example.

2022-08-09 Thread Dave Kemper
On 8/9/22, G. Branden Robinson  wrote:
> Similarly, I avoided demonstrating nroff because this is the groff(1)
> page, and that is the subject of discussion.

Oof, somehow I managed to overlook that.  Sorry for the noise.



Re: using groff/troff in producing academic journals

2022-08-09 Thread Robert Marks
As I said earlier, we used nroff/troff to produce the Australian Journal of
Management in the 1980s.

More recently — last year in fact — as the Editor I used troff to set the
text for an especially mathematical paper in the Journal and Proceedings of
the Royal Society of New South Wales (Sydney).  See the paper,
Basil Hiley.
The Moyal-Dirac controversy revisited.

Journal & Proceedings of the Royal Society of New South Wales *154*:
139-160, 2021
at
https://royalsoc.org.au/images/pdf/journal/154-2-Hiley.pdf

Robert Marks
-- 
https://www.agsm.edu.au/bobm 
0407665644


Re: Greek letters not slanted in -Tps eqn output

2022-08-09 Thread Deri
On Tuesday, 9 August 2022 20:38:44 BST G. Branden Robinson wrote:
> [looping in groff@ again due to a shift in discussion focus to
> development]
> 
> At 2022-08-09T19:14:25+0200, joerg van den hoff wrote:
> It is possible for device or font description files to be incomplete; if
> they are missing essential information, they will fail validation and
> the output driver will issue diagnostics.  groff Git is much more
> fastidious about such validation than 1.22.4 was.
> 
> Are you not getting a diagnostic message from grops(1) when it can't
> find "Symbolsl.pfa"?  Are you running groff 1.22.4 or Git?  If the
> latter then this is something I want to fix.  Even if I or someone else
> implements further searching, we'll need that diagnostic for when all
> download files are exhausted without a match.

Hi Branden,

I did deal with this issue a bit earlier. Grops (currently) has no way of 
knowing it is missing a resource - Symbolsl.pfa. The logic seems to be that if 
there is a groff font file (e.g. TR) which contains an "internalname" (e.g. 
Times-Roman), and download does not contain a reference to that internalname, 
then it is assumed the font does not need to be downloaded into the postscript 
file. This would be true for one of the base-35 fonts, since grops just needs 
to include the instruction "%%IncludeResource: Times-Roman" which signals to 
the consumer of the postscript file that it needs to provide the font itself 
(i.e. it is not embedded).

When this logic is used for a font which is not base-35 problems can occur 
because the consumer (i.e. ghostscript) has guess what font to use. It does 
not look like grops is aware of the 35 internalnames which form the base-35, 
so I cannot see a way how it can know it has a missing resource. It is relying 
on the presence of an entry in the download file to determine that it requires 
a type 1 font to be embedded in the postscript, so if the download file does 
not include the intenalname Symbol-Slanted it does not know it should have 
embedded Symbolsl.pfa.

If grops did include the 35 internalnames in its source then it could 
determine a resource was missing, by checking the internalname against that 
list, if it is not there and it is not found in the download file, it will 
know there is a problem. 

[snip]
> 
> Yes, updating the gropdf man page sounds appropriate.  Deri, would
> prefer to handle this or would you like me to?

'tis done.

> > well, if the (newer) gropdf does that, maybe it should be "backported"
> > to grops :).
> 
> A simple matter of rewriting Perl code in C++.  ;-)

Ooh, I love a hash (particularly corned beef). :-)
 
> > > > if yes, I can understand that after the `alpha beta gamma... `
> > > > sequence groff/eqn presume a different position of last greek
> > > > letter than actually is going to be true downstream when the ps
> > > > oder pdf is generated. and so groff/eqn would position whatever
> > > > comes next on a wrong horizontal position relative to the greek
> > > > letters.
> > > 
> > > After the greek characters comes the horizontal line between 1/2. If
> > > the move to the start of the line before drawing is a relative
> > > movement from the end of greek characters then it will be in the
> > > wrong position.
> > 
> > ok, in this case, yes. I would have expected the relative movement
> > being relative to the fraction bar itself, but that's obviously not
> > what eqn does.
> 
> This part I cannot speak to, as I have only barely begun coping with GNU
> eqn's source code.  If someone else knows, I hope they will clarify.

Sigh! Again.

It is not eqn.
It is not groff.
It is not grops (although a warning would be nice, but see above).
It is ghostscript, which when it can't include the resource for Symbol-Slanted 
picks Symbol instead, which has different glyph widths, rather than barf with 
a message "Can't find font Symbol-Slanted".

Your focus on what groff -Z produces is misplaced. It will have no bearing on 
the pdf code which ghostscript produces. Ghostscript must build up a page as 
an internal structure and then use an output driver to produce appropriate 
output, pcl, pdf etc.. So if you use ghostscript to process ps->ps or pdf->pdf 
you will find very big differences internally, but the actual output will look 
the same.

Just for a bit of fun!! Here's the groff_out,  the pdf instructions produced 
by ghostscript and the pdf instructions produced by gropdf.

First groff:-

x font 5 TR
f5
s1
V84000
H72000
md
DFd
tA
wh2500
tsimple
wh2500
tequation
wh2500
tsuch
wh2500
tas
n12000 0

Each word is a separate t command, and since the TR font has "spacewidth 250" 
and the point size of the font is 10 this explains the wh2500 as a white-space 
horizontal movement.

Now for the ghostscript produced pdf instructions:-

2.5 Tc
/R7 10 Tf
1 0 0 1 72 708 Tm
(As)Tj
0 Tc
13.6102 0 Td
(imple equation such as)Tj

A bit of odd! "A s" has been squashed to "As" but it is preceded by the 2.5 Tc 
which is an instruction to use 2.

Re: Counterexamples in C programming and library documentation (was: [PATCH v3] NULL.3const: Add documentation for NULL)

2022-08-09 Thread James K. Lowden
On Tue, 2 Aug 2022 14:06:45 -0500
"G. Branden Robinson"  wrote:

> (the "root of all evil" thing), which also got stuffed into
> Donald Knuth's mouth

Knuth did say it.  Please see " Structured Programming with go to
Statements" mentioned in "ACM Computing Surveys Vol. 6, No. 4" at

https://dl.acm.org/doi/10.1145/356635.356640
and
https://dl.acm.org/doi/pdf/10.1145/356635.356640.  

Regarding, 

> https://ubiquity.acm.org/article.cfm?id=1513451

Hyde's assertion is that performance intution is easy and necessary.
His advice boils down to learning about hardware and compilers.  That
won't get you far writing fast programs today.  

The highest order of performance in almost any application today is
minimizing I/O, because I/O is orders of magnitude slower than
anything else.  

Second is the knowing the language of implementation.  I once wrote a
slow program in Perl.  It used the very clever pack/unpack functions on
a file of hundreds of megabytes, one 1 KB record at a time.  It ran
IIRC 90 minutes.  Eventually someone rewrote it for me in C, after I
told them not to, and the runtime fell to 5 minutes.  I doubt it was
particularly good C; it's just hard to write obvious C that's slower
than obvious Perl.  

Python programmers know this dictum well.  The key to fast Python is no
Python: make sure every line packs a wallop. Push all the processing
down into libraries written in C, to avoid the slow-as-molasses
interpreter.  Similarly, Java programmers know how to rein in the
garbage collector.  

Only after those hurdles are cleared do we arrive at standard-issue CS
concerns like big-O complexity and NUMA memories.  Even there, the
practical solution may well be found in Rob Pike's observation, 

"Complex algorithms are slow for small N, 
 and N is usually small."

Hyde is right -- although he doesn't exactly say so -- that performance
is an emergent property of design.  Rare is the system with
implementation performance problems that can be improved by finding the
hot spots.  More common, in my experience, are errors in the design
that no amount of implementation can correct.  

--jkl