"Namsun Ch'o" <namn...@safe-mail.net> writes: >> Our intention since the beginning was to protect the host from the >> illegal guest operations. But you do have an interesting point about >> flaws on qemu itself. Perhaps this might be something I could work on to >> improve (start a bigger whitelist and get it tighter before guest >> launches). > > The seccomp filters are always passed on through execve(), so it would not be > possible to have the parent have chroot() whitelisted to chroot, then spawn a > child without it. As far as I know, even a root process cannot chroot another > process, even its child, so if the process is to chroot at all, it must have > the chroot syscall whitelisted. What can be done, however, is using the > argument passed to -chroot as the filter. The same could be done with setuid, > by having it only whitelist the uid which is given at -runas. > > An example, using chdir (I presume QEMU uses chdir(dir) then chroot(".")): > > sh# mkdir /tmp/chroot > sh# cat | gcc -lseccomp -x c - > #include <stdio.h> > #include <fcntl.h> > #include <seccomp.h> > > void main(void) > { > const char *dir = "/tmp/chroot"; > > scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_TRAP); > > seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(mkdir), 0); > seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(fchdir), 0); > seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(open), 0); > seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(brk), 0); > seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(chdir), 1, > SCMP_A0(SCMP_CMP_EQ, dir)); > seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(chroot), 1, > SCMP_A0(SCMP_CMP_EQ, ".")); > > seccomp_load(ctx); > > chdir(dir); > chroot("."); > > /* evil code starts here */ > const int fd = open(".", O_DIRECTORY); > mkdir("foo"); > chroot("foo"); > fchdir(fd); > chdir(".."); > chdir(".."); > chdir(".."); > chroot("."); > }^D^D > sh# strace -qq -e open,mkdir,chdir,chroot ./a.out 2>&1 | fold -s -w 80 > chdir("/tmp/chroot") = 0 > chroot(".") = 0 > open(".", O_RDONLY|O_DIRECTORY) = 3 > mkdir("foo", 0200000) = 0 > --- SIGSYS {si_signo=SIGSYS, si_code=SYS_SECCOMP, > si_call_addr=0x34400a7d397, > si_syscall=161, si_arch=3221225534} --- > +++ killed by SIGSYS +++ > Bad system call > sh# grep 161 /usr/include/asm/unistd_64.h > #define __NR_chroot 161 > > So there's really no need to disable chroot() or setuid(), just filter the > arguments based on command line input to make them impossible to abuse.
Drawback: complexity. If we decide to limit ourselves to the original threat model (rogue guest), and enter the sandbox only after setup, we can keep things simpler.