On Tue, Jan 04, 2022 at 10:21:58AM +0100, Steinar H. Gunderson wrote:
> I took a look at mandb's profile in perf, and even after turning off
> libseccomp, it appears that perhaps 10–11% of its time is spent doing real
> work (decompression, lexer, malloc, character set conversion). The rest is
> kernel overhead from launching and exiting pipelines, which is a fairly steep
> price.

I made a straw man to test whether this was really true, and it turns it is.
See the attached patch, which rips out the pipelines from mandb and replaces
them by simple one-shot decompression buffers; it's by no means something
that should be applied in its current state, but it reliably finds and parses
all 24216 man pages on my system, giving identical behavior (as determined by
“apropos ''”) to the current man-db. But it finishes in 4.7 seconds instead
of 51, so indeed, about 9% of the time, and at a performance point where it
is unlikely to be too painful in a full-upgrade.

Of course, this is an unfair comparison because it does not do encoding
conversion, but most pages do not need that, and in any case, adding it would
be unlikely to get up to more than 10–11%. It also does not do error handling,
arbitrary-size man pages (it uses fixed buffers instead of streaming; again
completely possible to fix, or one could set upper limits), anything
resembling good coding style, or shelling out to col(1). (I believe the
latter is only for parsing cat pages, though, which would seem less relevant
today.)

But it does set a bar that I think should be approachable by mandb, without
any special microoptimization, multithreading, fancy parsing algorithms
or the likes. :-)

/* Steinar */
-- 
Homepage: https://www.sesse.net/

Reply via email to