On Fri, Jul 19, 2019 at 01:40:13PM -0400, Andy Lutomirski wrote: > > On Jul 19, 2019, at 1:03 PM, Sean Christopherson > > <sean.j.christopher...@intel.com> wrote: > > > > The generic vDSO implementation, starting with commit > > > > 7ac870747988 ("x86/vdso: Switch to generic vDSO implementation") > > > > breaks seccomp-enabled userspace on 32-bit x86 (i386) kernels. Prior to > > the generic implementation, the x86 vDSO used identical code for both > > x86_64 and i386 kernels, which worked because it did all calcuations using > > structs with naturally sized variables, i.e. didn't use __kernel_timespec. > > > > The generic vDSO does its internal calculations using __kernel_timespec, > > which in turn requires the i386 fallback syscall to use the 64-bit > > variation, __NR_clock_gettime64. > > This is basically doomed to break eventually, right?
Just so I'm understanding: the vDSO change introduced code to make an actual syscall on i386, which for most seccomp filters would be rejected? > I’ve occasionally considered adding a concept of “seccomp aliases”. The idea > is that, if a filter returns anything other than ALLOW, we re-run it with a > different nr that we dig out it a small list of such cases. This would be > limited to cases where the new syscall does the same thing with the same > arguments. Would that help here? The kernel just sees this a direct syscall. I guess it could whitelist it by checking the return address? > I want this for restart_syscall: I want to renumber it. Oh man, don't get me started on restart_syscall. Some architectures make it invisible to seccomp and others don't. ugh. -- Kees Cook