Follow-up Comment #15, bug #68145 (group groff): [comment #13 comment #13:] > I don't know if this is helpful.
I think it is!
> With unlimited page length the value of 'vpos' just keeps increasing.
Yes. That's by design; it's the nature of the continuous rendering beast. At
least in the "new" approach.
> At the bottom of the multi bash file it is set to
> V16440280
That sounds about right. It's 56 times more than the vertical drawing
position at the end of _one_ copy of the bash man page on my system.
$ groff -man -T utf8 -Z $(man -w bash) | grep '^V' | tail -n 1
V289880
$ echo '16440280/289880' | bc -l
56.71408858838139919966
So it's in the ballpark for 50 (or 64) copies of the document.
> In that case this code in tty.cpp looks expensive:-
> if (vpos > nlines) {
> tty_glyph **old_lines = lines;
> lines = new tty_glyph *[vpos + 1];
> memcpy(lines, old_lines, nlines * sizeof(tty_glyph *));
> for (int i = nlines; i <= vpos; i++)
> lines[i] = 0;
> delete[] old_lines;
> nlines = vpos + 1;
> }
> Under 1.23.0 the max value for vpos is 9960. This code is called in add_char
> so I assume it is called for every character, and vpos is incremented for
> every line output.
Yes and no. add_char() _is_ called for every output character but this
new/memcpy/delete sequence that demands a lot of the language runtime's
dynamic memory allocator doesn't run for every character added, because there
are two `if` guards, one of which you quoted.
if (v == cached_v && cached_v != 0)
...
else
...
if (vpos > nlines) {
...
So I think the allocation dance happens only when `v` is not `cached_v` _and_
not zero, _and_ when `vpos` exceeds `nlines`.
That should happen, at worst, with every line written by _grotty_.
That said, for 50 copies of _bash_(1):
$ groff -man -T utf8 $(man -w bash) | wc -l
7247
$ echo '7247 50 * p' | dc
362350
...which is a lot of _memcpy_(3) calls.
> Pretty sure my analysis is /wrong/incomplete/unhelpful/ as usual. 😄
No, I think it's worth exploring. I think the next thing to do is instrument
this code to count the number of times grotty reallocates its character cell
array. (That's what this `lines` thing is.)
We should find out if the Arch Linux users suffering the performance hit get
the same number.
If they don't, we definitely want to find out why.
If they do, then we can ask them to take up the quadratic performance
degradation with the vendor of their C++ runtime.
Either way, there _might_ be something we can do in _grotty_ about this.
A. We could support a command-line option that pre-sizes `nlines` to a
specified value, or to something gigantic. The variable is dereferenced in
only a few places. It's initialized to "66", which is bog-standard 12-point
spacing on an 11-inch-tall U.S. letter piece of paper. We already have
satisfactory experiences with the page length being shortened below that just
before the document ends.
B. We could have the _man_(7) (and _mdoc_(7)) packages use a device extension
command to transmit a hint to _grotty_ that continuous rendering is in use,
and therefore `nlines` should be huge.
There are tradeoffs here.
1. Having _grotty_ demand more memory than it's going to actually use is
discourteous to other processes on the system.
2. The reported problem arises only in pathological cases. While the
original report claims a performance degradation for inputs of all size, I
observe from the results in comment #8 that a document doesn't render twice as
slowly as before until it's eight times the size of the Bash man page. That's
a 3 megabyte input document. The new approach to continuous rendering solves
real-world problems, like misdrawn vertical rules in the Linux man-pages
_ascii_(7) document--which happens to be **much** shorter than the Bash man
page, let alone multiple copies thereof. A performance regression that
affects only extreme outliers of possible inputs might not be worth solving.
And the real issue might ultimately not be ours anyway. Maybe somebody's
Standard C++ Library needs to use a different heap management strategy, or
support configurable hints available on a per-process basis for selecting
among several.
One approach that I saw used in C with the X Window System was, once the
dynamic storage allocated to some variable (often an array) was almost full,
to `realloc()` it as double the size. Repeat as needed. That would work well
with _grotty_(1)'s use case. (And we could actually handle `nlines` this way
ourselves.) Where it's not so good is if your storage requirement doesn't
monotonically increase but bounces around. There was a case like this in X.
I added code to some piece of it to `realloc()` the desired space _smaller_
once it was vacated down to one-quarter of its previous size. Why
one-quarter? Because one-half would take you back to where you were at the
last doubling, and if you have the misfortune to be servicing requests that
repeatedly take you just over the limit and then duck back below it, you'll
thrash the allocator. With the double-and-one-quarter approach, only **big**
swings in a memory region's utilization prompt reallocation.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?68145>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
signature.asc
Description: PGP signature
