On Sun, Sep 01, 2019 at 06:22:00AM +0000, Niels Thykier wrote: > Colin Watson: > > I think I might actually extend manconv instead; it already does a > > certain amount of what you need here and just needs autodetection of > > input encoding and the multiple-files interface. > > > > manconv is currently installed in man-db's libexecdir, but I could > > easily move it onto $PATH. Since it isn't currently on $PATH, that > > would provide you with an easy way to test whether this new interface is > > supported (I could also add "manconv --has-bulk" or something, but I > > don't think it's necessary in this case). > > SGTM. :)
For internal code organisation reasons it ended up being easier to add a new "man-recode" tool instead. Could you please try the tmp/recode-tool branch of https://git.savannah.gnu.org/cgit/man-db.git ? To build it, something like this should work: sudo apt build-dep man-db ./bootstrap ./configure --prefix=/usr --libexecdir=\${libdir} --with-config-file=/etc/manpath.config --enable-mb-groff --enable-silent-rules --with-db=gdbm make -j4 make -j4 check You should then be able to run src/man-recode. Initial performance testing from my end: to convert all the pages in manpages-pl to UTF-8, it takes about 0.6 seconds. This is cheating slightly because it takes a short cut in the case where the pages already appear to be in UTF-8; so if I instead tell it to convert to ISO-8859-2, it takes about 6.3 seconds. Compared to about 122 seconds (without parallelisation) with "man -l --recode UTF-8", I think that's probably good enough. Thanks, -- Colin Watson [[email protected]]

