On Thu, Jun 18, 2026, Ackerley Tng wrote:
> Introduce base support for KVM_SET_MEMORY_ATTRIBUTES2 in guest_memfd, which
> just updates attributes tracked by guest_memfd.
> 
> Validate input fields in general. Guard usage of KVM_SET_MEMORY_ATTRIBUTES2
> by making sure requested attributes are supported for this instance of kvm.
> 
> A new KVM_SET_MEMORY_ATTRIBUTES2 is defined to support writes (unlike

Phrase this as a command using imperative mood.  The wording is also weird,
because "support writes" makes it sound like it allows controlling WRITE 
attributes,
whereas what you mean by "support writes" is "allowing KVM to write back error
information to the struct without technically violating the semantics embedded
in the ioctl".  It's doubly confusing because the macros use a different 
polarity:
IOW means userspace is writing, but this implicitly refers to IOW as "reads".

> KVM_SET_MEMORY_ATTRIBUTES) in addition to reads so it can provide error
> details to userspace. This will be used in a later patch.
> 
> The two ioctls use their corresponding structs with no overlap, but
> backward compatibility is baked in for future support of
> KVM_SET_MEMORY_ATTRIBUTES2 and struct kvm_memory_attributes2 in the VM
> ioctl.

I don't understand what this paragraph is trying to say with respect to 
backwards
compatibility.  It's a new ioctl and struct, there's no compatibility in sight.

E.g.

  Add a new ioctl (and matching struct), KVM_SET_MEMORY_ATTRIBUTES2, using
  the same base ioctl number (0xd2), but with R/W semantics for the kernel
  instead of just read semantics.  "Officially" documenting that KVM writes
  to the payload will allow KVM to support partial/incremental conversions,
  instead of all-or-nothing updates (which requires complex unwinding), by
  recording the failing offset if an error occurs.

  Opportunistically add a new struct as well, even though KVM could squeeze
  the error offset into "struct kvm_memory_attributes", as there's no cost
  to doing so in practice.  Pad the struct with a pile of extra space to try
  and avoid ending up with "struct kvm_memory_attributes3" in the future.
  Use the same layout for the fields that common to version 1 of the struct,
  e.g. to ease upgrading userspace, and to provide flexibility in KVM ever
  adds support for KVM_SET_MEMORY_ATTRIBUTES2 at VM scope.

> The process of setting memory attributes is set up such that the later half
> will not fail due to allocation. Any necessary checks are performed before
> the point of no return.

Explain *why*.  Readers can usually understand the "what" by reading the code,
but it's much harder to discern *why* things were done a certain way.  Some 
things
go without saying, e.g. "validate input fields", but in that case, just drop the
changelog blurb (if we _weren't_ validating input, *that* would be interesting 
and
worth calling out).

> Co-developed-by: Vishal Annapurve <[email protected]>
> Signed-off-by: Vishal Annapurve <[email protected]>
> Co-developed-by: Sean Christoperson <[email protected]>
> Signed-off-by: Sean Christoperson <[email protected]>
> Reviewed-by: Fuad Tabba <[email protected]>
> Signed-off-by: Ackerley Tng <[email protected]>
> ---
>  include/uapi/linux/kvm.h |  13 ++++++
>  virt/kvm/Kconfig         |   1 +
>  virt/kvm/guest_memfd.c   | 116 
> +++++++++++++++++++++++++++++++++++++++++++++++
>  virt/kvm/kvm_main.c      |  12 +++++
>  4 files changed, 142 insertions(+)
> 
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 419011097fa8e..956877a6aab05 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1649,6 +1649,19 @@ struct kvm_memory_attributes {
>       __u64 flags;
>  };
>  
> +#define KVM_SET_MEMORY_ATTRIBUTES2              _IOWR(KVMIO,  0xd2, struct 
> kvm_memory_attributes2)
> +
> +struct kvm_memory_attributes2 {
> +     union {
> +             __u64 address;
> +             __u64 offset;
> +     };
> +     __u64 size;
> +     __u64 attributes;
> +     __u64 flags;
> +     __u64 reserved[12];
> +};
> +
>  #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
>  
>  #define KVM_CREATE_GUEST_MEMFD       _IOWR(KVMIO,  0xd4, struct 
> kvm_create_guest_memfd)
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 297e4399fbd49..cfa2c78ba5fb9 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -102,6 +102,7 @@ config KVM_MMU_LOCKLESS_AGING
>  
>  config KVM_GUEST_MEMFD
>         select XARRAY_MULTI
> +       select KVM_MEMORY_ATTRIBUTES
>         bool
>  
>  config HAVE_KVM_ARCH_GMEM_PREPARE
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 65ce795c090d9..0d14548c1ed22 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -541,11 +541,127 @@ bool kvm_gmem_is_private(struct kvm *kvm, gfn_t gfn)
>  }
>  EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gmem_is_private);
>  
> +/*
> + * Preallocate memory for attributes to be stored on a maple tree, pointed to
> + * by mas.  Adjacent ranges with attributes identical to the new attributes
> + * will be merged.  Also sets mas's bounds up for storing attributes.
> + *
> + * This maintains the invariant that ranges with the same attributes will
> + * always be merged.
> + */
> +static int kvm_gmem_mas_preallocate(struct ma_state *mas, u64 attributes,
> +                                 pgoff_t start, size_t nr_pages)
> +{
> +     pgoff_t end = start + nr_pages;
> +     pgoff_t last = end - 1;
> +     void *entry;
> +
> +     /* Try extending range. entry is NULL on overflow/wrap-around. */
> +     mas_set(mas, end);
> +     entry = mas_find(mas, end);
> +     if (entry && xa_to_value(entry) == attributes)
> +             last = mas->last;
> +
> +     if (start > 0) {
> +             mas_set(mas, start - 1);
> +             entry = mas_find(mas, start - 1);
> +             if (entry && xa_to_value(entry) == attributes)
> +                     start = mas->index;
> +     }
> +
> +     mas_set_range(mas, start, last);
> +     return mas_preallocate(mas, xa_mk_value(attributes), GFP_KERNEL);
> +}
> +
> +static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
> +                                  size_t nr_pages, uint64_t attrs)
> +{
> +     struct address_space *mapping = inode->i_mapping;
> +     struct gmem_inode *gi = GMEM_I(inode);
> +     pgoff_t end = start + nr_pages;
> +     struct maple_tree *mt;
> +     struct ma_state mas;
> +     int r;
> +
> +     mt = &gi->attributes;
> +
> +     filemap_invalidate_lock(mapping);
> +
> +     mas_init(&mas, mt, start);
> +     r = kvm_gmem_mas_preallocate(&mas, attrs, start, nr_pages);
> +     if (r)
> +             goto out;
> +
> +     /*
> +      * From this point on guest_memfd has performed necessary
> +      * checks and can proceed to do guest-breaking changes.
> +      */
> +
> +     kvm_gmem_invalidate_start(inode, start, end);
> +     mas_store_prealloc(&mas, xa_mk_value(attrs));
> +     kvm_gmem_invalidate_end(inode, start, end);
> +out:
> +     filemap_invalidate_unlock(mapping);
> +     return r;
> +}
> +
> +static long kvm_gmem_set_attributes(struct file *file, void __user *argp)
> +{
> +     struct gmem_file *f = file->private_data;
> +     struct inode *inode = file_inode(file);
> +     struct kvm_memory_attributes2 attrs;
> +     size_t nr_pages;
> +     pgoff_t index;
> +     int i;
> +
> +     if (copy_from_user(&attrs, argp, sizeof(attrs)))
> +             return -EFAULT;
> +
> +     if (attrs.flags)
> +             return -EINVAL;
> +     for (i = 0; i < ARRAY_SIZE(attrs.reserved); i++) {
> +             if (attrs.reserved[i])
> +                     return -EINVAL;
> +     }
> +     if (!kvm_arch_has_private_mem(f->kvm))
> +             return -EINVAL;
> +     if (attrs.attributes & ~KVM_MEMORY_ATTRIBUTE_PRIVATE)
> +             return -EINVAL;
> +     if (attrs.size == 0 || attrs.offset + attrs.size < attrs.offset)
> +             return -EINVAL;
> +     if (!PAGE_ALIGNED(attrs.offset) || !PAGE_ALIGNED(attrs.size))
> +             return -EINVAL;
> +
> +     if (attrs.offset >= i_size_read(inode) ||
> +         attrs.offset + attrs.size > i_size_read(inode))
> +             return -EINVAL;
> +
> +     nr_pages = attrs.size >> PAGE_SHIFT;
> +     index = attrs.offset >> PAGE_SHIFT;
> +     return __kvm_gmem_set_attributes(inode, index, nr_pages,
> +                                      attrs.attributes);
> +}
> +
> +static long kvm_gmem_ioctl(struct file *file, unsigned int ioctl,
> +                        unsigned long arg)
> +{
> +     switch (ioctl) {
> +     case KVM_SET_MEMORY_ATTRIBUTES2:
> +             if (!gmem_in_place_conversion)
> +                     return -ENOTTY;
> +
> +             return kvm_gmem_set_attributes(file, (void __user *)arg);
> +     default:
> +             return -ENOTTY;
> +     }
> +}
> +
>  static struct file_operations kvm_gmem_fops = {
>       .mmap           = kvm_gmem_mmap,
>       .open           = generic_file_open,
>       .release        = kvm_gmem_release,
>       .fallocate      = kvm_gmem_fallocate,
> +     .unlocked_ioctl = kvm_gmem_ioctl,
>  };
>  
>  static int kvm_gmem_migrate_folio(struct address_space *mapping,
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 01761f6e25d25..a08b518cdb175 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -105,6 +105,18 @@ module_param(allow_unsafe_mappings, bool, 0444);
>  bool __ro_after_init gmem_in_place_conversion = false;
>  #endif
>  
> +#define MEMORY_ATTRIBUTES_MATCH(one, two)                            \

Use the same terminology as the memory region asserts, i.e.
SANITY_CHECK_MEM_ATTRIBUTES_FIELD.  MEMORY_ATTRIBUTES_MATCH() reads like a 
helper
that checks if the two objects have the same attributes.

And put the checks where it actually matters, i.e. in the case-statement for
KVM_SET_MEMORY_ATTRIBUTES (again, same as KVM_SET_USER_MEMORY_REGION).  Because
the only reason it matters for KVM is if we want to add VM-scoped support for
KVM_SET_MEMORY_ATTRIBUTES2 in the future, at which point we'll want to use the
same overlay shenanigans that we did for KVM_SET_USER_MEMORY_REGION2.

> +     static_assert(offsetof(struct kvm_memory_attributes, one) ==    \
> +                   offsetof(struct kvm_memory_attributes2, two));    \

And then once these are landed in function scope, use BUILD_BUG_ON() with a
do { ... } while (0).

> +     static_assert(sizeof_field(struct kvm_memory_attributes, one) ==\
> +                   sizeof_field(struct kvm_memory_attributes2, two))
> +
> +/* Ensure the common parts of the two structs are identical. */
> +MEMORY_ATTRIBUTES_MATCH(address, address);
> +MEMORY_ATTRIBUTES_MATCH(size, size);
> +MEMORY_ATTRIBUTES_MATCH(attributes, attributes);
> +MEMORY_ATTRIBUTES_MATCH(flags, flags);

Please put these asserts in the location where the overlay matters.  Actually, I
don't think we need to enforce this?

Reply via email to