On Mon, Jan 03, 2022 at 10:34:08PM +0000, Colin Watson wrote: > This part is already covered in https://bugs.debian.org/696503. I admit > that this was last updated ten years ago; I really need to get back to > that and get it to work one way or another. :-/
Ah, well, that wasn't so easy to understand from man-db's bug list. :-) > This is an interesting observation. I do think that seccomp is valuable > defence in general given the amount of rather ad-hoc parsing that's > going on, but I hadn't noticed that it made such a big difference to > mandb performance. (About 3ms per subprocess here purely for setup, I > think.) > > The seccomp sandbox is mainly so that man-db can more safely operate on > manual pages shipped by packaging systems that expect to run code under > confinement and thus can ship relatively untrusted code (such as snapd). > If we limited the mandb invocation in the postinst to only > /usr/share/man, we could reasonably assume that all pages there were > installed by dpkg, and those don't need the same level of care since any > package installed by dpkg can run code directly as root anyway. It > would then be reasonable to run it under MAN_DISABLE_SECCOMP=1. Let me see if I understand the threat model here, to see where you're coming from. The idea is that the user installs a snap (whose installation and running is nominally sandboxed), which contains a malicious man page containing an arbitrary execution exploit for man-db's apropos parser or similar? That sounds like something that is possible for the postinst to ignore, yes. OTOH, if you assume packages can carry malicious man files, is it also part of the threat model that it could be overriding other package's documentation with spam and/or really bad advice (like “run this vulnerable script as root”)? TBH man-db sounds fairly safe compared to many other things, though. I've only skimmed the code, but as I understand it, a typical mandb run sends the file through gzip and a flex parser? If you find a bug in zlib or in flex, I would assume there are _much_ juicier targets out there than dropping a malicious man page into a snap -- so we're really left with the surrounding code that takes the output of the flex parser. Which isn't really all that much code. I understand these are the famous last words, of course... :-) I understand that man(1) itself pushes things through groff, which is a different story -- but only for the case where you run man setuid. Of course, rewriting these into a safe language would be the best solution, but it's probably out-of-scope. > The difference between MAN_DISABLE_SECCOMP=1 and building without > libseccomp looks like a relatively simple fix, at least: > > > https://gitlab.com/cjwatson/man-db/-/commit/50200d151dfedb9d5064ec7008c09f6cf0f5ee24 OK, that's a good thing. > I really don't want to make apropos less useful. I'll have another go > at tackling the core problem here; but the fixes described above should > at least get us back to merely the situation we've had for years, rather > than the (as you say) much worse behaviour recently. Yes, it would be really nice to have speedups and not just no regressions. :-) We keep installing more packages and man pages, so software probably needs to be made for larger data sets eventually. It's a bit like dpkg -- a lot of the design decisions made sense with 100 packages and 10k files, but when you get to 10x or 100x that, it starts getting a real pain point even if you just freeze it and don't make it any slower. I took a look at mandb's profile in perf, and even after turning off libseccomp, it appears that perhaps 10–11% of its time is spent doing real work (decompression, lexer, malloc, character set conversion). The rest is kernel overhead from launching and exiting pipelines, which is a fairly steep price. On the surface of it, it's not immediately obvious to me what the pipeline abstraction is all about. Is there a reason why you can't just open the man page, decompress it in one shot and then use fmemopen()? (Or similarly, open_memstream().) If it's bigger than, say, 16 MB uncompressed, you can probably just kick it from apropos. > Given #696503, would you mind if I repurposed this bug to be just about > fixing the seccomp-related performance regression? I think the plan > above will let me deal with just that part of it relatively quickly, and > the fixes would be small enough to be viable candidates for stable. Sure, that would be fine. It would perhaps be useful to make 696503 a bit clearer to the casual observer, though; i.e., link it to man-db somehow and make the title more about performance. /* Steinar */ -- Homepage: https://www.sesse.net/