On Mon, Jun 18, 2012 at 3:22 PM, Corey Bryant <cor...@linux.vnet.ibm.com> wrote: > > > On 06/18/2012 04:33 AM, Daniel P. Berrange wrote: >> >> On Fri, Jun 15, 2012 at 07:04:45PM +0000, Blue Swirl wrote: >>> >>> On Wed, Jun 13, 2012 at 8:33 PM, Daniel P. Berrange <berra...@redhat.com> >>> wrote: >>>> >>>> On Wed, Jun 13, 2012 at 07:56:06PM +0000, Blue Swirl wrote: >>>>> >>>>> On Wed, Jun 13, 2012 at 7:20 PM, Eduardo Otubo >>>>> <ot...@linux.vnet.ibm.com> wrote: >>>>>> >>>>>> I added a syscall struct using priority levels as described in the >>>>>> libseccomp man page. The priority numbers are based to the frequency >>>>>> they appear in a sample strace from a regular qemu guest run under >>>>>> libvirt. >>>>>> >>>>>> Libseccomp generates linear BPF code to filter system calls, those >>>>>> rules >>>>>> are read one after another. The priority system places the most common >>>>>> rules first in order to reduce the overhead when processing them. >>>>>> >>>>>> Also, since this is just a first RFC, the whitelist is a little raw. >>>>>> We >>>>>> might need your help to improve, test and fine tune the set of system >>>>>> calls. >>>>>> >>>>>> v2: Fixed some style issues >>>>>> Removed code from vl.c and created qemu-seccomp.[ch] >>>>>> Now using ARRAY_SIZE macro >>>>>> Added more syscalls without priority/frequency set yet >>>>>> >>>>>> Signed-off-by: Eduardo Otubo <ot...@linux.vnet.ibm.com> >>>>>> --- >>>>>> qemu-seccomp.c | 73 >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>> qemu-seccomp.h | 9 +++++++ >>>>>> vl.c | 7 ++++++ >>>>>> 3 files changed, 89 insertions(+) >>>>>> create mode 100644 qemu-seccomp.c >>>>>> create mode 100644 qemu-seccomp.h >>>>>> >>>>>> diff --git a/qemu-seccomp.c b/qemu-seccomp.c >>>>>> new file mode 100644 >>>>>> index 0000000..048b7ba >>>>>> --- /dev/null >>>>>> +++ b/qemu-seccomp.c >>>>>> @@ -0,0 +1,73 @@ >>>>> >>>>> >>>>> Copyright and license info missing. >>>>> >>>>>> +#include <stdio.h> >>>>>> +#include <seccomp.h> >>>>>> +#include "qemu-seccomp.h" >>>>>> + >>>>>> +static struct QemuSeccompSyscall seccomp_whitelist[] = { >>>>> >>>>> >>>>> 'const' >>>>> >>>>>> + { SCMP_SYS(timer_settime), 255 }, >>>>>> + { SCMP_SYS(timer_gettime), 254 }, >>>>>> + { SCMP_SYS(futex), 253 }, >>>>>> + { SCMP_SYS(select), 252 }, >>>>>> + { SCMP_SYS(recvfrom), 251 }, >>>>>> + { SCMP_SYS(sendto), 250 }, >>>>>> + { SCMP_SYS(read), 249 }, >>>>>> + { SCMP_SYS(brk), 248 }, >>>>>> + { SCMP_SYS(clone), 247 }, >>>>>> + { SCMP_SYS(mmap), 247 }, >>>>>> + { SCMP_SYS(mprotect), 246 }, >>>>>> + { SCMP_SYS(ioctl), 245 }, >>>>>> + { SCMP_SYS(recvmsg), 245 }, >>>>>> + { SCMP_SYS(sendmsg), 245 }, >>>>>> + { SCMP_SYS(accept), 245 }, >>>>>> + { SCMP_SYS(connect), 245 }, >>>>>> + { SCMP_SYS(bind), 245 }, >>>>> >>>>> >>>>> It would be nice to avoid connect() and bind(). Perhaps seccomp init >>>>> should be postponed to after all sockets have been created? >>>> >>>> >>>> If you want to migrate your guest, you need to be able to >>>> call connect() at an arbitrary point in the QEMU process' >>>> lifecycle. So you can't avoid allowing connect(). Similarly >>>> if you want to allow hotplug of NICs (and their backends) >>>> then you need to have both bind() + connect() available. >>> >>> >>> That's bad. Migration could conceivably be extended to use file >>> descriptor passing, but hotplug is more tricky. >> >> >> As with execve(), i'm reporting this on the basis that on the previous >> patch posting I was told we must whitelist any syscalls QEMU can >> conceivably use to avoid any loss in functionality. > > > Thanks for pointing out syscalls needed for the whitelist. > > As Paul has already mentioned, it was recommended that we restrict all of > QEMU (as a single process) from the start of execution. This is opposed to > other options of restricting QEMU from the time that vCPUS start, further > restricting based on syscall parms, or decomposing QEMU into multiple > processes that are individually restricted with their own seccomp > whitelists.
Can each thread have separate seccomp whitelists? For example CPU threads should not need pretty much anything but the I/O thread needs I/O. > I think this approach is a good starting point that can be further tuned in > the future. And as with most security measures, defense in depth improves > the cause (e.g. combining seccomp with DAC or MAC). Agreed. > > -- > Regards, > Corey > >