[redirecting to groff@gnu; bug-groff is mainly a reflector for the
Savannah ticket tracker for groff]

Hi Brian,

At 2023-02-11T18:17:49-0700, Brian Inglis wrote:
> Hi folks,
> 
> Running Cygwin groff 1.22.4 mandb 2.11.2 man -H tbl is not rendered from
> newlib strftime.3 man page (truncated after .TE, 44 redundant occurrences of
> "l l" removed before "l l.", other lines commented out in .3 file attached,
> as is generated HTML, and docbook source: see below).

Ah.  DocBook source.  I'll give you the bad news first: a high-quality
converter of DocBook documents to man(7) is not known to the groff
community.

I un-commented the following lines to reduce the number of warnings I
saw.

.ie \n(.g .ds Aq \(aq
.el       .ds Aq '

> This man page tbl extract is interesting, as it needs at least the .TH
> directive plus the .TS/.../.TE lines to generate the tty man page,

Yes.  I got the following diagnostics from groff Git HEAD.

$ ./build/test-groff -ww -t -man -Thtml \
    ~/Downloads/newlib-strftime-tbl.3>strf.html
/home/branden/Downloads/newlib-strftime-tbl.3:49: error: boxed table does not 
fit on page 1; use .TS H/.TH with a supporting macro package
troff: error: suppression limit registers span more than a page; grohtml-info 
for image 1 will be wrong

(The second diagnostic can be ignored for our purposes.)

You (well, the DocBook-related tool) needs to construct the table with
".TS H" instead of plain ".TS".  This is a hard requirement for
multi-page tables that are boxed or for which column heading repetition
is desired.

The only reason the table renders anyway on the terminal is because the
groff man(7) package has a feature that grows the page length in an
unbounded[1] way.  This is termed "continuous rendering" in the
groff_man(7) man page.  If you turn this feature off with the "-rcR=0"
option to the formatter...

$ ./build/test-groff -ww -t -rcR=0 -man -Tascii \
    ~/Downloads/newlib-strftime-tbl.3 | less -R

...you will get the same problem.  I got it for PostScript, too, so I
expect this document's problem to afflict every output device except
terminals, and those as well if continuous rendering is not used.

> whereas tables from other man pages can be extracted and still render
> on both tty and HTML.

I suspect that this is because they, perhaps accidentally, manage to
follow the rules about composing large tables.

> If you can point me to something in the tbl content that is
> problematic and/or how to fix it, given the below, then I could work
> my way back down the chain to fix the root cause, or determine it is a
> groff/man bug, possibly fixed in the pending release, I would greatly
> appreciate the pointer.

I do not observe any bugs with groff/man/tbl for this page with groff
Git HEAD.  (Some significant bugs in tbl have been fixed since groff
1.22.4,[3] but at a glance I don't see any implicated by this input.)

I will note that the table is composed with one hand tied behind its
back, so to speak, by using text blocks in the right-hand column but not
the "x" column modifier in that column to permit it to expand until the
table fills the available line length.  Unfortunately that is not enough
to repair this table's interaction with the HTML output device, but it
is nevertheless a tbl(1) feature that any converter to *roff/man/tbl
output should exercise when warranted.

Long story short, the line:

l l.

could become

l lx.

profitably.

The forced call of paragraphing macros at the start of every text block
is also unidiomatic, and, given that it puts a blank line at the top of
the table cell, ugly as well.

> [The man page source comes from docbook comments embedded in the
> source:

Er, uh, looking at it, are you sure this is DocBook?

Compare

https://en.wikipedia.org/wiki/DocBook#Sample_document

with
> https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/time/strftime.c

  18 /*
  19 FUNCTION
  20 <<strftime>>, <<strftime_l>>---convert date and time to a formatted string
  21
  22 INDEX
  23         strftime
  24
  25 INDEX
  26         strftime_l
  27
  28 SYNOPSIS
  29         #include <time.h>
  30         size_t strftime(char *restrict <[s]>, size_t <[maxsize]>,

...this really does not look like DocBook to me.  (Admittedly I haven't
looked at DocBook in well over a decade.)  Perhaps it is some other
plain text markup format that has sworn to supplant all others.[2]
ReST?  Sphinx?

On the bright side, if it isn't DocBook, there is a less notorious
legacy of failure in producing *roff/man/tbl output from it.  So maybe
the problems _can_ be fixed.

I'm afraid using ".TS H" for large boxed tables is a non-negotiable
item simply due to the way the *roff systems format pages.  (Details
available upon request.)  On the bright side, if a table has column
headings, using ".TS H" even on a table with few enough rows to fit a
single page is harmless.  But on the gripping hand, this is not done in
man(7) pages because of the name collision between tbl's understanding
of '.TH' (marking the end of column headings) and man(7)'s (start a new
man page document).  This collision could be worked around, but I've
seldom seen demand for multi-page tables in man pages expressed.

This table lacks column headings altogether, however, so my prescription
would be to drop the "allbox" region option.  It is a cosmetic feature
and not required in most tables.

Indeed, if I add the "x" column modifier, drop "allbox", and kill the
".PP" calls inside all the text blocks, the content formats pleasantly
enough on PostScript.  The blank lines between table rows can be
recovered with a quick bit of Vim to emplace blank table rows after each
text block.  I'm attaching a specimen.

But really at this point I have to wonder why the translation tool
doesn't format input like this using tagged paragraphs.  This is the
`TP` macro, documented in groff_man(7).

> via makedocbook python script which generates xml (attached),

I think maybe something is generating DocBook or DocBook-XML _from_ this
<<nesty bracket>> markup as the initial stage.

> then docbook generates html, man pages, PDFs, texinfo, and the latter
> generate libc info.]

Personally, I lack confidence in DocBook to occupy this central role in
document format interconversion.  Others on the groff list may want to
share their experiences.

Regards,
Branden

[1] technically not unbounded; to 2^31-1 basic units
[2] https://xkcd.com/927/
[3] 
https://savannah.gnu.org/bugs/index.php?go_report=Apply&group=groff&func=&set=custom&msort=0&report_id=225&advsrch=0&bug_id=&summary=&submitted_by=0&resolution_id=0&assigned_to=0&bug_group_id=0&status_id=3&severity=0&category_id=109&plan_release_id=103&history_search=0&history_field=0&history_event=modified&history_date_dayfd=12&history_date_monthfd=2&history_date_yearfd=2023&chunksz=50&spamscore=5&boxoptionwanted=1#options

(I wish Savannah didn't have such ridiculous query URLs; most of these
parameters recapitulate defaults.)
'\" t
.TH "STRFTIME" "3" "01/19/2023" "newlib" "Time Functions (time.h)"
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\" .nh
.\" .ad l
.\" .SH "NAME"
.\" strftime, strftime_l \- convert date and time to a formatted string
.\" .SH "SYNOPSIS"
.\" .sp
.\" .ft B
.\" .nf
.\" #include <time\&.h>
.\" .fi
.\" .ft
.\" .HP \w'size_t\ strftime('u
.\" .BI "size_t strftime(char\ *restrict\ " "s" ", size_t\ " "maxsize" ", 
const\ char\ *restrict\ " "format" ", const\ struct\ tm\ *restrict\ " "timp" 
");"
.\" .HP \w'size_t\ strftime_l('u
.\" .BI "size_t strftime_l(char\ *restrict\ " "s" ", size_t\ " "maxsize" ", 
const\ char\ *restrict\ " "format" ", const\ struct\ tm\ *restrict\ " "timp" ", 
locale_t\ " "locale" ");"
.SH "DESCRIPTION"
.PP
.\" strftime
.\" converts a
.\" struct tm
.\" representation of the time (at
.\" \fItimp\fR) into a null\-terminated string, starting at
.\" \fIs\fR
.\" and occupying no more than
.\" \fImaxsize\fR
.\" characters\&.
.PP
.\" strftime_l
.\" is like
.\" strftime
.\" but creates a string in a format as expected in locale
.\" \fIlocale\fR\&. If
.\" \fIlocale\fR
.\" is LC_GLOBAL_LOCALE or not a valid locale object, the behaviour is 
undefined\&.
.PP
.\" You control the format of the output using the string at
.\" \fIformat\fR\&.
.\" *\fIformat\fR
.\" can contain two kinds of specifications: text to be copied literally into 
the formatted string, and time conversion specifications\&. Time conversion 
specifications are two\- and three\-character sequences beginning with `%\*(Aq 
(use `%%\*(Aq to include a percent sign in the output)\&. Each defined 
conversion specification selects only the specified field(s) of calendar time 
data from
.\" *\fItimp\fR, and converts it to a string in one of the following ways:
.PP
.TS
tab(:);
l lx.
T{
%a
T}:T{
The abbreviated weekday name according to the current locale\&. [tm_wday]
T}
\&
T{
%A
T}:T{
The full weekday name according to the current locale\&. In the default "C" 
locale, one of `Sunday\*(Aq, `Monday\*(Aq, `Tuesday\*(Aq, `Wednesday\*(Aq, 
`Thursday\*(Aq, `Friday\*(Aq, `Saturday\*(Aq\&. [tm_wday]
T}
\&
T{
%b
T}:T{
The abbreviated month name according to the current locale\&. [tm_mon]
T}
\&
T{
%B
T}:T{
The full month name according to the current locale\&. In the default "C" 
locale, one of `January\*(Aq, `February\*(Aq, `March\*(Aq, `April\*(Aq, 
`May\*(Aq, `June\*(Aq, `July\*(Aq, `August\*(Aq, `September\*(Aq, 
`October\*(Aq, `November\*(Aq, `December\*(Aq\&. [tm_mon]
T}
\&
T{
%c
T}:T{
The preferred date and time representation for the current locale\&. [tm_sec, 
tm_min, tm_hour, tm_mday, tm_mon, tm_year, tm_wday]
T}
\&
T{
%C
T}:T{
The century, that is, the year divided by 100 then truncated\&. For 4\-digit 
years, the result is zero\-padded and exactly two characters; but for other 
years, there may a negative sign or more digits\&. In this way, `%C%y\*(Aq is 
equivalent to `%Y\*(Aq\&. [tm_year]
T}
\&
T{
%d
T}:T{
The day of the month, formatted with two digits (from `01\*(Aq to `31\*(Aq)\&. 
[tm_mday]
T}
\&
T{
%D
T}:T{
A string representing the date, in the form `"%m/%d/%y"\*(Aq\&. [tm_mday, 
tm_mon, tm_year]
T}
\&
T{
%e
T}:T{
The day of the month, formatted with leading space if single digit (from 
`1\*(Aq to `31\*(Aq)\&. [tm_mday]
T}
\&
T{
%Ex
T}:T{
In some locales, the E modifier selects alternative representations of certain 
modifiers
x\&. In newlib, it is ignored, and treated as %x\&.
T}
\&
T{
%F
T}:T{
A string representing the ISO 8601:2000 date format, in the form 
`"%Y\-%m\-%d"\*(Aq\&. [tm_mday, tm_mon, tm_year]
T}
\&
T{
%g
T}:T{
The last two digits of the week\-based year, see specifier %G (from `00\*(Aq to 
`99\*(Aq)\&. [tm_year, tm_wday, tm_yday]
T}
\&
T{
%G
T}:T{
The week\-based year\&. In the ISO 8601:2000 calendar, week 1 of the year 
includes January 4th, and begin on Mondays\&. Therefore, if January 1st, 2nd, 
or 3rd falls on a Sunday, that day and earlier belong to the last week of the 
previous year; and if December 29th, 30th, or 31st falls on Monday, that day 
and later belong to week 1 of the next year\&. For consistency with %Y, it 
always has at least four characters\&. Example: "%G" for Saturday 2nd January 
1999 gives "1998", and for Tuesday 30th December 1997 gives "1998"\&. [tm_year, 
tm_wday, tm_yday]
T}
\&
T{
%h
T}:T{
Synonym for "%b"\&. [tm_mon]
T}
\&
T{
%H
T}:T{
The hour (on a 24\-hour clock), formatted with two digits (from `00\*(Aq to 
`23\*(Aq)\&. [tm_hour]
T}
\&
T{
%I
T}:T{
The hour (on a 12\-hour clock), formatted with two digits (from `01\*(Aq to 
`12\*(Aq)\&. [tm_hour]
T}
\&
T{
%j
T}:T{
The count of days in the year, formatted with three digits (from `001\*(Aq to 
`366\*(Aq)\&. [tm_yday]
T}
\&
T{
%k
T}:T{
The hour (on a 24\-hour clock), formatted with leading space if single digit 
(from `0\*(Aq to `23\*(Aq)\&. Non\-POSIX extension (c\&.p\&. %I)\&. [tm_hour]
T}
\&
T{
%l
T}:T{
The hour (on a 12\-hour clock), formatted with leading space if single digit 
(from `1\*(Aq to `12\*(Aq)\&. Non\-POSIX extension (c\&.p\&. %H)\&. [tm_hour]
T}
\&
T{
%m
T}:T{
The month number, formatted with two digits (from `01\*(Aq to `12\*(Aq)\&. 
[tm_mon]
T}
\&
T{
%M
T}:T{
The minute, formatted with two digits (from `00\*(Aq to `59\*(Aq)\&. [tm_min]
T}
\&
T{
%n
T}:T{
A newline character (`\en\*(Aq)\&.
T}
\&
T{
%Ox
T}:T{
In some locales, the O modifier selects alternative digit characters for 
certain modifiers
x\&. In newlib, it is ignored, and treated as %x\&.
T}
\&
T{
%p
T}:T{
Either `AM\*(Aq or `PM\*(Aq as appropriate, or the corresponding strings for 
the current locale\&. [tm_hour]
T}
\&
T{
%P
T}:T{
Same as \*(Aq%p\*(Aq, but in lowercase\&. This is a GNU extension\&. [tm_hour]
T}
\&
T{
%q
T}:T{
Quarter of the year (from `1\*(Aq to `4\*(Aq), with January starting the first 
quarter\&. This is a GNU extension\&. [tm_mon]
T}
\&
T{
%r
T}:T{
Replaced by the time in a\&.m\&. and p\&.m\&. notation\&. In the "C" locale 
this is equivalent to "%I:%M:%S %p"\&. In locales which don\*(Aqt define 
a\&.m\&./p\&.m\&. notations, the result is an empty string\&. [tm_sec, tm_min, 
tm_hour]
T}
\&
T{
%R
T}:T{
The 24\-hour time, to the minute\&. Equivalent to "%H:%M"\&. [tm_min, tm_hour]
T}
\&
T{
%s
T}:T{
The time elapsed, in seconds, since the start of the Unix epoch at 1970\-01\-01 
00:00:00 UTC\&.
T}
\&
T{
%S
T}:T{
The second, formatted with two digits (from `00\*(Aq to `60\*(Aq)\&. The value 
60 accounts for the occasional leap second\&. [tm_sec]
T}
\&
T{
%t
T}:T{
A tab character (`\et\*(Aq)\&.
T}
\&
T{
%T
T}:T{
The 24\-hour time, to the second\&. Equivalent to "%H:%M:%S"\&. [tm_sec, 
tm_min, tm_hour]
T}
\&
T{
%u
T}:T{
The weekday as a number, 1\-based from Monday (from `1\*(Aq to `7\*(Aq)\&. 
[tm_wday]
T}
\&
T{
%U
T}:T{
The week number, where weeks start on Sunday, week 1 contains the first Sunday 
in a year, and earlier days are in week 0\&. Formatted with two digits (from 
`00\*(Aq to `53\*(Aq)\&. See also
%W\&. [tm_wday, tm_yday]
T}
\&
T{
%V
T}:T{
The week number, where weeks start on Monday, week 1 contains January 4th, and 
earlier days are in the previous year\&. Formatted with two digits (from 
`01\*(Aq to `53\*(Aq)\&. See also
%G\&. [tm_year, tm_wday, tm_yday]
T}
\&
T{
%v
T}:T{
A string representing the BSD/OSX/Ruby VMS/Oracle date format, in the form 
"%e\-%b\-%Y"\&. Non\-POSIX extension\&. [tm_mday, tm_mon, tm_year]
T}
\&
T{
%w
T}:T{
The weekday as a number, 0\-based from Sunday (from `0\*(Aq to `6\*(Aq)\&. 
[tm_wday]
T}
\&
T{
%W
T}:T{
The week number, where weeks start on Monday, week 1 contains the first Monday 
in a year, and earlier days are in week 0\&. Formatted with two digits (from 
`00\*(Aq to `53\*(Aq)\&. [tm_wday, tm_yday]
T}
\&
T{
%x
T}:T{
Replaced by the preferred date representation in the current locale\&. In the 
"C" locale this is equivalent to "%m/%d/%y"\&. [tm_mon, tm_mday, tm_year]
T}
\&
T{
%X
T}:T{
Replaced by the preferred time representation in the current locale\&. In the 
"C" locale this is equivalent to "%H:%M:%S"\&. [tm_sec, tm_min, tm_hour]
T}
\&
T{
%y
T}:T{
The last two digits of the year (from `00\*(Aq to `99\*(Aq)\&. [tm_year] 
(Implementation interpretation: always positive, even for negative years\&.)
T}
\&
T{
%Y
T}:T{
The full year, equivalent to
%C%y\&. It will always have at least four characters, but may have more\&. The 
year is accurate even when tm_year added to the offset of 1900 overflows an 
int\&. [tm_year]
T}
\&
T{
%z
T}:T{
The offset from UTC\&. The format consists of a sign (negative is west of 
Greewich), two characters for hour, then two characters for minutes (\-hhmm or 
+hhmm)\&. If tm_isdst is negative, the offset is unknown and no output is 
generated; if it is zero, the offset is the standard offset for the current 
time zone; and if it is positive, the offset is the daylight savings offset for 
the current timezone\&. The offset is determined from the TZ environment 
variable, as if by calling tzset()\&. [tm_isdst]
T}
\&
T{
%Z
T}:T{
The current time zone abbreviation\&. If tm_isdst is negative, no output is 
generated\&. Otherwise, the time zone abbreviation is based on the TZ 
environment variable, as if by calling tzset()\&. [tm_isdst]
T}
\&
T{
%%
T}:T{
A single character, `%\*(Aq\&.
T}
\&
.TE

Attachment: signature.asc
Description: PGP signature

Reply via email to