On 01/05/2020 15:54, Leonardo Bras wrote:
> If SELinux is setup without 'execmem' permission for qemu, all mmap
> with (PROT_WRITE | PROT_EXEC) will fail and print a warning in
> SELinux log.
>
> If "nvlink2-mr" memory allocation fails (fist diff), it will cause
> guest NUMA nodes to not be correctly configured (V100 memory will
> not be visible for guest, nor its NUMA nodes).
>
> Not having 'execmem' permission is intesting for virtual machines to
> avoid buffer-overflow based attacks, and it's adopted in distros
> like RHEL.
>
> So, removing the PROT_EXEC flag seems the right thing to do.
>
> Browsing some other code that mmaps memory for usage with
> memory_region_init_ram_device_ptr, I could notice it's usual to
> not have PROT_EXEC (only PROT_READ | PROT_WRITE), so it should be
> no problem around this.
>
> Signed-off-by: Leonardo Bras <leobra...@gmail.com>
> ---
> hw/vfio/pci-quirks.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> index 2d348f8237..124d4f57e1 100644
> --- a/hw/vfio/pci-quirks.c
> +++ b/hw/vfio/pci-quirks.c
> @@ -1620,7 +1620,7 @@ int vfio_pci_nvidia_v100_ram_init(VFIOPCIDevice *vdev,
> Error **errp)
> }
> cap = (void *) hdr;
>
> - p = mmap(NULL, nv2reg->size, PROT_READ | PROT_WRITE | PROT_EXEC,
> + p = mmap(NULL, nv2reg->size, PROT_READ | PROT_WRITE,
Unlike other users of memory_region_init_ram_device_ptr(), this memory
is more like a normal RAM, it is cache coherent and you can execute code
from it. I am unaware if this happens in practice but it easily could.
Having said that, I can see QEMU is not setting PROT_EXEC even for RAM
but KVM sets it anyway in a PTE in the partition scope tree (== guest
physical-to-host physical translation table) at [1] which is fine, I
just checked, with the change, the exec bit is still in the partition tree.
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/kvm/book3s_64_mmu_radix.c?h=v5.7-rc3#n857
> MAP_SHARED, vdev->vbasedev.fd, nv2reg->offset);
> if (p == MAP_FAILED) {
> ret = -errno;
> @@ -1680,7 +1680,7 @@ int vfio_pci_nvlink2_init(VFIOPCIDevice *vdev, Error
> **errp)
>
> /* Some NVLink bridges may not have assigned ATSD */
> if (atsdreg->size) {
> - p = mmap(NULL, atsdreg->size, PROT_READ | PROT_WRITE | PROT_EXEC,
> + p = mmap(NULL, atsdreg->size, PROT_READ | PROT_WRITE,
This chunk is ok as ATSD is a set of registers.
Reviewed-by: Alexey Kardashevskiy <a...@ozlabs.ru>
> MAP_SHARED, vdev->vbasedev.fd, atsdreg->offset);
> if (p == MAP_FAILED) {
> ret = -errno;
>
--
Alexey