On Fri, Jun 29, 2012 at 3:27 PM, Corey Bryant <cor...@linux.vnet.ibm.com> wrote: > > > On 06/28/2012 03:49 PM, Blue Swirl wrote: >> >> On Wed, Jun 27, 2012 at 9:25 PM, Anthony Liguori <anth...@codemonkey.ws> >> wrote: >>> >>> On 06/21/2012 03:04 AM, Avi Kivity wrote: >>>> >>>> >>>> On 06/19/2012 09:58 PM, Blue Swirl wrote: >>>>>>> >>>>>>> >>>>>>> At least qemu-ifup/down scripts, migration exec and smbd have been >>>>>>> mentioned. Only the system calls made by smbd (for some version of >>>>>>> it) >>>>>>> can be known. The user could specify arbitrary commands for the >>>>>>> others, those could be assumed to use some common (large) subset of >>>>>>> system calls but I think the security value would be close to zero >>>>>>> then. >>>>>> >>>>>> >>>>>> >>>>>> We're not trying to protect against the user, but against the guest. >>>>>> If >>>>>> we assume the user wrote those scripts with care so they cannot be >>>>>> exploited by the guest, then we are okay. >>>>> >>>>> >>>>> >>>>> My concern was that first we could accidentally filter a system call >>>>> that changes the script or executable behavior, much like sendmail + >>>>> capabilities bug, and then a guest could trigger running this >>>>> script/executable and exploit the changed behavior. >>>> >>>> >>>> >>>> Ah, I see. I agree this is dangerous. We should probably disable exec >>>> if we seccomp. >>> >>> >>> >>> There's no great place to jump into this thread so I guess I'll do it >>> here. >>> >>> There is absolutely no doubt that white-listing syscalls that we >>> currently >>> use provides an improvement in security. >>> >>> We need to assume: >>> >>> 1) QEMU is run as an unprivileged user >>> >>> 2) QEMU is already heavily restricted by SELinux >>> >>> In this case, seccomp() is not being used to replace MAC or DAC. It's >>> supplementing both of them by additionally filtering out syscalls that >>> may >>> have unknown kernel exploits in them. That's all this initial effort is >>> about. Since it's scope is so limited, we can simply enable it >>> unconditionally too. >> >> >> I don't think the scope is limited in a safe way. What is the set of >> system calls that can't ever cause problems to any possible ifup/down >> scripts, migration exec helpers and various versions of smbd? >> >> For example, unlink() is missing. What if the ifup/down script needs >> it for lock file cleanup? ftruncate()? Every socket syscalls in case >> LDAP is used to access user information by the libc? >> >> I think we can't define the safe set, except 'allow all'. I'd propose >> one of the following to avoid breakage: >> >> 1. Allow all system calls for the initial patch, refactor later to >> reduce the set. Useless until refactored. >> >> 2. Don't make seccomp mode enabled default, when enabled, forbid >> execve(). Limits functionality when enabled, no security benefit if >> not enabled. > > > It should be noted that PR_SET_NO_NEW_PRIVS is set by default when the > seccomp filter is enabled by libseccomp. This prevents any new privileges > from being granted on execve.
This is probably getting very hypothetical, but what happens if the ifup/down scripts need to run a setuid/gid helper or a helper with additional privileges from file system capabilities? > > >> >> 3. Before enabling seccomp, fork a helper process without restrictions >> that is used to launch other programs. Needs some work. >> >>> >>> After we have this initial support, then we can look at a -sandbox >>> option. >>> This open could prevent things like open()/execve() but that will come >>> at a >>> cost of features. >>> >>> I think the reasonable thing to do for -sandbox is to basically focus on >>> the >>> set of syscalls that QEMU would use if it were launched under libvirt. >>> We >>> should obviously make improvements (things like -blockdev) to make this >>> even >>> more restrictive. >>> >>> Who knows, maybe we end up having multiple types of sandboxes. A >>> '-sandbox >>> libvirt' and a '-sandbox user' where the later is focused on the typical >>> usage of an unprivileged user. >>> >>> But this is all stuff that can come later. We solve a big problem by >>> just >>> getting the initial whitelist support in. >> >> >> Fully agree, but we'd have to agree about what is a safe initial >> whitelist. >> >>> >>> Regards, >>> >>> Anthony Liguori >>> >>> >>>> >>>>>> >>>>>> We have decomposed qemu to some extent, in that privileged operations >>>>>> happen in libvirt. So the modes make sense - qemu has no idea whether >>>>>> a >>>>>> privileged management system is controlling it or not. >>>>> >>>>> >>>>> >>>>> So with -seccomp, libvirt could tell QEMU that for example open(), >>>>> execve(), bind() and connect() will never be needed? >>>> >>>> >>>> >>>> Yes. >>>> >>> >> > > -- > Regards, > Corey > >