On Mon, Jan 17, 2022 at 09:15:14PM +0100, Steinar H. Gunderson wrote: > On Mon, Jan 17, 2022 at 04:10:02AM +0000, Colin Watson wrote: > > We definitely do need to sort out encoding conversion, though. Although > > UTF-8 has been recommended for many years, policy still allows "the > > usual legacy encoding" and we've never got round to mandating UTF-8: > > > > $ w3m -dump https://lintian.debian.org/tags/national-encoding | grep > > --count usr/share/man > > 502 > > I tried looking at this, but TBH I don't think the man-db path _ever_ > inserts a conversion. The parameter to the lexer path is simply always NULL > in all calls, except for from a test. Am I missing something?
test_manfile (which despite the name is not a test function) calls find_name with file!="-" and encoding=NULL; that causes find_name to call get_page_encoding, which always returns something non-NULL ("ISO-8859-1" for English pages), and then call add_manconv from that to UTF-8. > > Would you care to have a look at this? > > > > https://gitlab.com/cjwatson/man-db/-/merge_requests/2 > > Thanks! I'll have a go at a review, but it might need to wait until the end > of the week. I'll try to get it done earlier, though. How would you like any > review comments? Email or somehow in Gitlab? Mild preference for GitLab MR comments, but I'm not that fussy. > > There's probably still room for improvement, but unlikely to be much > > more than a factor of two or so at this point, and I think this should > > get us comfortably back to the point where it's no longer annoying > > people during upgrades. > > Someone else suggested an idea I thought of throwing around: In addition > to optimizing the code, perhaps the postinst trigger should simply launch the > man-db timer in the background? At least for systemd users, this should be > pretty straightforward, and give the control back to the installation > process. (Given that dpkg is very much single-threaded, we're not generally > throughput-bound, so using up a core shouldn't be a big problem.) I suppose that's an option; apt-xapian-index.postinst seems to be precedent for doing things in the background, albeit in a non-systemd way. Unlike the postinst, man-db.service omits -p/--no-purge, but maybe that would be OK, especially after this round of optimization. I'll consider that once I get to packaging these changes. -- Colin Watson (he/him) [cjwat...@debian.org]