* Christian Brauner: > I'm sorry but I have some doubts about this new "rule". The idea of > being able to reliably trigger an error for a system call other then > EPERM might have merrit in some scenarios but justifying it via a bug in > a userspace standard is not enough in my opinion. > > The solution is to fix the standard to mandate ENOSYS. This is the > correct error for this exact scenario and standards can be changed. > I don't think it is the kernel's job to work around a deliberate > userspace decision to use EPERM and not ENOSYS. The kernel's system call > design should not be informed by this especially since this is clearly > not a kernel bug. > > Apart from that I have doubts that this is in any shape or form > enforceable. Not just because in principle there might be system calls > that only return EPERM on error but also because this requirement feels > arbitrary and I doubt developers will feel bound by it or people will > check for it. > >> + >> +If a system call has such error behavior, upon encountering an >> +``EPERM`` error, userspace applications can perform further >> +invocations of the same system call to check if the ``EPERM`` error >> +persists for those known error conditions. If those also fail with >> +``EPERM``, that likely means that the original ``EPERM`` error was the >> +result of a seccomp filter, and should be treated like ``ENOSYS`` > > I think that this "approach" alone should illustrate that this is the > wrong way to approach this. It's hacky and requires excercising a system > call multiple times just to find out whether or not it is supported. > The only application that would possibly do this is probably glibc. > This seems to be the complete wrong way of solving this problem.
Thank you for your feedback. I appreciate it. I agree that the standard should mandate ENOSYS, and I've just proposed a specification change here: <https://groups.google.com/a/opencontainers.org/g/dev/c/8Phfq3VBxtw> However, such a change may take some time to implement. Meanwhile, we have the problem today with glibc that it wants to use the faccessat2 system call but it can't. I've been told that it would make glibc incompatible with the public cloud and Docker. The best solution I could come up with it is this awkward probing sequence. (Just checking for the zero flags argument is not sufficient because systemd calls fchmodat with AT_SYMLINK_NOFOLLOW.) I do not wish to put the probing sequence into glibc (upstream or downstream) unless it is blessed to some degree by kernel developers. I consider it quite ugly and would prefer if more of us share the blame. We will face the same issue again with fchmodat2 (or fchmodat4 if that's what it's name is going to be). And we have been lucky in recent times that didn't need a new system call to fix a security vulnerability in an existing system call in wide use by userspace (although faccessat2 comes rather close because it replaces a userspace permission check approximation with a proper kernel check). The seccomp situation means that we can't, reliably, and the probing hack seems to be way out. That's another reason for not just putting in the probing sequence quietly and be done with it: I'd like to discuss this aspect in the open, before we need it as part of a fix for some embargoed security vulnerability. Thanks, Florian -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill