On Tue, Jun 19, 2012 at 11:04 AM, Avi Kivity <a...@redhat.com> wrote: > On 06/16/2012 09:46 AM, Blue Swirl wrote: >> On Fri, Jun 15, 2012 at 9:36 PM, Paul Moore <pmo...@redhat.com> wrote: >>> On Friday, June 15, 2012 09:23:46 PM Blue Swirl wrote: >>>> On Fri, Jun 15, 2012 at 9:02 PM, Paul Moore <pmo...@redhat.com> wrote: >>>> > On Friday, June 15, 2012 07:06:10 PM Blue Swirl wrote: >>>> >> I think allowing execve() would render seccomp pretty much useless. >>>> > >>>> > Not necessarily. >>>> > >>>> > I'll agree that it does seem a bit odd to allow execve(), but there is >>>> > still value in enabling seccomp to disable potentially buggy/exploitable >>>> > syscalls. Let's not forget that we have over 300 syscalls on x86_64, not >>>> > including the 32 bit versions, and even if we add all of the new syscalls >>>> > suggested in this thread we are still talking about a small subset of >>>> > syscalls. As far as security goes, the old adage of "less is more" >>>> > applies. >>>> >>>> The helper program being executed could need any of the 300 system >>>> calls, so we'd have to allow all. >>> >>> Don't we have some basic understanding of what the applications being exec'd >>> will need to do? I sorta see your point, but allowing the entire set of >>> syscalls seems a bit dramatic. >> >> At least qemu-ifup/down scripts, migration exec and smbd have been >> mentioned. Only the system calls made by smbd (for some version of it) >> can be known. The user could specify arbitrary commands for the >> others, those could be assumed to use some common (large) subset of >> system calls but I think the security value would be close to zero >> then. > > We're not trying to protect against the user, but against the guest. If > we assume the user wrote those scripts with care so they cannot be > exploited by the guest, then we are okay.
My concern was that first we could accidentally filter a system call that changes the script or executable behavior, much like sendmail + capabilities bug, and then a guest could trigger running this script/executable and exploit the changed behavior. > > However I agree with you that it would be better to restrict those > syscalls. The scripts are already unnecessary if using a management > system and migration supports passed file descriptors, so that leaves > only smbd, which can probably be pre-execed. File descriptor passing could also work for smbd. > >> >>> >>>> > Protecting against the abuse and misuse of execve() is something that is >>>> > better done with the host's access controls (traditional DAC, MAC via the >>>> > LSM, etc.). >>>> >>>> How about seccomp mode selected by command line switch -seccomp, in >>>> which bind/connect/open/execve are forbidden? The functionality >>>> remaining would be somewhat limited (can't migrate or use SMB etc. >>>> until refactoring of QEMU), but that way seccomp jail would be much >>>> tighter. >>> >>> When I spoke to Anthony about this earlier (offline, sorry) he was opposed >>> to >>> requiring any switches or user interaction to enable seccomp. I'm not sure >>> if >>> his stance on this has changed any over the past few months. >> >> There could be two modes, strict mode (-seccomp) and default mode >> (only some syscalls blocked). With the future decomposed QEMU, strict >> seccomp mode would be default and the switch would be obsoleted. If >> the decomposition is planned to happen soonish, adding the switch >> would be just churn. > > We have decomposed qemu to some extent, in that privileged operations > happen in libvirt. So the modes make sense - qemu has no idea whether a > privileged management system is controlling it or not. So with -seccomp, libvirt could tell QEMU that for example open(), execve(), bind() and connect() will never be needed? > >> >>> >>> In my perfect world, we would have a decomposed QEMU that functions as a >>> series of processes connected via some sort of IPC; the exact divisions are >>> a >>> bit TBD and beyond the scope of this discussion. In this scenario we would >>> be >>> able to restrict QEMU with sVirt and seccomp to a much higher degree than we >>> could with the current monolithic QEMU. >>> >>> I don't expect to see my perfect world any time soon, but in the meantime we >>> can still improve the security of QEMU on Linux with these seccomp patches >>> and >>> for that reason I think it's a win. Since these patches don't expose >>> anything >>> at runtime (no knobs, switches, etc.) we leave ourselves plenty of >>> flexibility >>> for changing things in the future. >> >> Yes, I'm much in favor of adding seccomp support soon. But I just >> wonder if this is really the best level of security we can reach now, >> not assuming decomposed QEMU, but just minor tweaks? > > We might disable mprotect(PROT_EXEC) if running with kvm. > > -- > error compiling committee.c: too many arguments to function > >