On Sun, Sep 01, 2019 at 06:22:00AM +0000, Niels Thykier wrote:
> Colin Watson:
> > I think I might actually extend manconv instead; it already does a
> > certain amount of what you need here and just needs autodetection of
> > input encoding and the multiple-files interface.
> > 
> > manconv is currently installed in man-db's libexecdir, but I could
> > easily move it onto $PATH.  Since it isn't currently on $PATH, that
> > would provide you with an easy way to test whether this new interface is
> > supported (I could also add "manconv --has-bulk" or something, but I
> > don't think it's necessary in this case).
> 
> SGTM. :)

For internal code organisation reasons it ended up being easier to add a
new "man-recode" tool instead.

Could you please try the tmp/recode-tool branch of
https://git.savannah.gnu.org/cgit/man-db.git ?  To build it, something
like this should work:

  sudo apt build-dep man-db
  ./bootstrap
  ./configure --prefix=/usr --libexecdir=\${libdir} 
--with-config-file=/etc/manpath.config --enable-mb-groff --enable-silent-rules 
--with-db=gdbm
  make -j4
  make -j4 check

You should then be able to run src/man-recode.

Initial performance testing from my end: to convert all the pages in
manpages-pl to UTF-8, it takes about 0.6 seconds.  This is cheating
slightly because it takes a short cut in the case where the pages
already appear to be in UTF-8; so if I instead tell it to convert to
ISO-8859-2, it takes about 6.3 seconds.  Compared to about 122 seconds
(without parallelisation) with "man -l --recode UTF-8", I think that's
probably good enough.

Thanks,

-- 
Colin Watson                                       [[email protected]]

Reply via email to