Hello Daniel, > On Oct 03, 2016, at 14:55, Daniel P. Berrange <berra...@redhat.com> wrote: > >> Well, it unlinks the file but the references are still there while the >> descriptor isn't closed by this process, or by the one that receives the >> descriptor (that is why is the "unlink" so early). >> >> If you check vhost_dev_log_resize(), it gets *possible* new vhost log >> (if a new size is given) and informs the vhost dev driver about the new >> log base (vhost_ops->vhost_set_log_base). >> >> For vhost_user, this means that the file descriptors for vhost logs are >> likely going to be passed to vhost backend (fds[] in >> vhost_user_set_log_base). This is just one example, not sure about >> others. >> >> Probably the best approach here, like what Marc-André said, is to create >> some sort of TMPDIR, set by libvirt perhaps ? > > So you're saying that the file descriptor here is actually getting > passed to a different process for it to use ? > > If so that means we definitely do not want this in TMPDIR. If we > create a generic file in TMPDIR, then its going to have a generic > security label. That means that the other process we're giving the > FD to is going to have to be granted permission to access this FD > and we certainly don't want to grant permission for it to access > any of QEMU's other FDs. So for the SELinux integration, we'll > need this FD to be in a specific directory, so that we can setup > policy such that the file created gets given a specific SELinux > label. We can then grant the other process access to only that > particular file, and not anything else of QEMU's. > > This makes me wonder about the memfd_create() code path too - we'll > again not want that external process to be granted access to arbitrary > FDs of QEMU's and I'm not sure of a way to get the memfd FD to have > a specific label. So I think it is possible that when using libvirt > we'll want the ability to tell QEMU to *always* use an explicit file > in a path libvirt specifies, and never use memfd even if available.
Check this execution path: (vhost_vsock_device_realize) vhost_dev_init vhost_commit |- vhost_get_log_size |... |- vhost_dev_log_resize (vhost_dev_log_resize): vhost_log_get -> here if the size is bigger, a new log is created dev->vhost_ops->vhost_set_log_base() -> kernel or user vhost driver vhost_log_put() ---- So, * In case of the kernel mode, this is just a: vhost in kernel mode = vhost_kernel_set_log_base return vhost_kernel_call(dev, VHOST_SET_LOG_BASE, &base); which makes an ioctl to dev->opaque file descriptor to set a new vhost log base. * But in the case of user mode: vhost in user mode = vhost_user_set_log_base which gets the log file descriptor (log->fd) and gives to vhost_user_write. vhost_user_write will do a qemu_chr_fe_set_msgfds passing the log file descriptors for the backend vhost driver (CharDriverState). If I'm reading this right.. if the backend driver is: static int tcp_set_msgfds(CharDriverState *chr, int *fds, int num) it would check for: !qio_channel_has_feature(s->ioc, QIO_CHANNEL_FEATURE_FD_PASS)) { and configure s->write_msgfds. This would be sent in: static int tcp_chr_write(CharDriverState *chr, const uint8_t *buf, int len) with "io_channel_send_full" + "qio_channel_writev_full + io_writev from QIOChannelClass. ---- https://www.berrange.com/posts/2016/08/16/ This, from your blog, probably confirms this behaviour: "The migration code supports a number of different protocols besides just “tcp:“. In particular it allows an “fd:” protocol to tell QEMU to use a passed-in file descriptor, and an “exec:” protocol to tell QEMU to launch an external command to tunnel the connection. It is desirable to be able to use TLS with these protocols too, but when using TLS the client QEMU needs to know the hostname of the target QEMU in order to correctly validate the x509 certificate it receives. Thus, a second “tls-hostname” parameter was added to allow QEMU to be informed of the hostname to use for x509 certificate validation when using a non-tcp migration protocol. This can be set on the source QEMU prior to starting the migration using the “migrate_set_str_parameter” monitor command" =) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in QEMU: In Progress Bug description: And, when libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.877027] type=1400 audit(1471289780.311:74): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.904407] type=1400 audit(1471289780.343:75): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="qemu_bridge_helper" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.973064] type=1400 audit(1471289780.407:76): apparmor="DENIED" operation="mknod" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/memfd-tNpKSj" pid=12625 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 ouid=107 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979871] type=1400 audit(1471289780.411:77): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979881] type=1400 audit(1471289780.411:78): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/var/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 When leaving libvirt without apparmor capabilities (thus not confining virtual machines on compute nodes, the live migration works as expected, so, clearly, apparmor is stepping into the live migration). I'm sure that virtual machines have to be confined and that this isn't the desired behaviour... To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1626972/+subscriptions