(Adding Paul Moore to CC since he's done a lot of work on syscall whitelisting in QEMU)
On Wed, Sep 09, 2015 at 09:55:33PM -0400, namn...@safe-mail.net wrote: > This patch here adds argument filtering for three syscalls: > madvise(), shmget(), and shmctl(). > > The madvise() flags may need a few additions, but I couldn't find any common > cases where the extra flags were used. The only additions were ones I found by > grepping through the source code for all madvise-related flags: The current intention of the seccomp filter in QEMU, is that /all/ existing QEMU features continue to work unchanged. So even if a flag is used in a seemingly uncommon code path, we still need to allow that in a seccomp filter. Soo all these are needed: > $ grep -hro MADV_[A-Z]* qemu-2.3.0 | sort -u > MADV_DODUMP > MADV_DONTDUMP > MADV_DONTFORK > MADV_DONTNEED > MADV_HUGEPAGE > MADV_INVALID This one isn't a real constant as the syscall level > MADV_MERGEABLE > MADV_UNMERGEABLE > MADV_WILLNEED One thing to be wary off when checking for used syscalls in the source, is that a large amount of QEMU functionality is provided via 3rd party libraries, especially the crypto/TLS code, SPICE server, and various storage drivers for RBD, gluster, iSCSI and GTK/SDL for the UI. Probably not an issue for madvise, but this is in fact where the usage of shmget comes from - QEMU never directly uses it but (at least) GTK does call it. So in this case, we need to check at least the GTK source to ensure we covered all args. I wouldn't be surprised if SDL uses it too. This gets kind of tedious quite quickly.... > diff --git a/qemu-seccomp.c b/qemu-seccomp.c > index f9de0d3..0c11580 100644 > --- a/qemu-seccomp.c > +++ b/qemu-seccomp.c > @@ -264,6 +263,49 @@ int seccomp_start(void) > } > } > > + /* madvise */ > + static const int madvise_flags[] = { > + MADV_DONTFORK, > + MADV_DONTNEED, > + MADV_HUGEPAGE, > + MADV_MERGEABLE, So we need to add DODUMP, DONTDUMP, UNMERGABLE and WILLNEED here. That is still stricter than the previous allow-everything rule, so a net win. > + }; > + for (i = 0; i < ARRAY_SIZE(madvise_flags); i++) { > + rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(madvise), 1, > + SCMP_A2(SCMP_CMP_EQ, madvise_flags[i])); > + if (rc < 0) { > + goto seccomp_return; > + } > + } > + rc = seccomp_syscall_priority(ctx, SCMP_SYS(madvise), 245); > + if (rc < 0) { > + goto seccomp_return; > + } > + > + /* shmget */ > + rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(shmget), 2, > + SCMP_A0(SCMP_CMP_EQ, IPC_PRIVATE), > + SCMP_A2(SCMP_CMP_EQ, IP_CREAT|0777)); I'm not familiar with semantics of these seccomp rules, but is this saying that the second arg must be exactly equal to IP_CREAT|0777 ? If the app passes IP_CREAT|0600, would that be permitted instead ? The latter is what I see gtk2 source code passing for mode. > + if (rc < 0) { > + goto seccomp_return; > + } > + rc = seccomp_syscall_priority(ctx, SCMP_SYS(shmget), 240); > + if (rc < 0) { > + goto seccomp_return; > + } > + > + /* shmctl */ > + rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(shmctl), 2, > + SCMP_A1(SCMP_CMP_EQ, IPC_RMID), > + SCMP_A2(SCMP_CMP_EQ, 0)); > + if (rc < 0) { > + goto seccomp_return; > + } > + rc = seccomp_syscall_priority(ctx, SCMP_SYS(shmctl), 240); > + if (rc < 0) { > + goto seccomp_return; > + } > + Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|