Hi, On 2019-08-28 11:13:27 -0400, Joe Conway wrote: > Recent security best-practices recommend, and certain highly > security-conscious organizations are beginning to require, that SECCOMP > be used to the extent possible. The major web browsers, container > runtime engines, and systemd are all examples of software that already > support seccomp.
Maybe I'm missing something, but it's not clear to me what meaningful attack surface can be reduced for PostgreSQL by forbidding certain syscalls, given the wide variety of syscalls required to run postgres. That's different from something like a browser's CSS process, or such, which really doesn't need much beyond some IPC and memory allocations. But postgres is going to need syscalls as broad as fork/clone, exec, connect, shm*, etc. I guess you can argue that we'd still reduce the attack surface for kernel escalations, but that seems like a pretty small win compared to the cost. > * With built-in support, it is possible to lock down backend processes > more tightly than the postmaster. Which important syscalls would you get away with removing in backends that postmaster needs? I think the only one - which is a good one though - that I can think of is listen(). But even that might be too restrictive for some PLs running out of process. My main problem with seccomp is that it's *incredibly* fragile, especially for a program as complex as postgres. We already had seccomp related bug reports on list, even just due to the very permissive filtering by some container solutions. There's regularly new syscalls (e.g. epoll_create1(), and we'll soon get openat2()), different versions of glibc use different syscalls (e.g. switching from open() to always using openat()), the system configuration influences which syscalls are being used (e.g. using vsyscalls only being used for certain clock sources), and kernel. bugfixes change the exact set of syscalls being used ([1]). [1] https://lwn.net/Articles/795128/ Then there's also the issue that many extensions are going to need additional syscalls. > Notes on usage: > =============== > In order to determine your minimally required allow lists, do something > like the following on a non-production server with the same architecture > as production: > c) Cut and paste the result as the value of session_syscall_allow. That seems nearly guaranteed to miss a significant fraction of syscalls. There's just no way we're going to cover all the potential paths and configurations in our testsuite. I think if you actually wanted to do something like this, you'd need to use static analysis to come up with a more reliable list. Greetings, Andres Freund