Yan Zhao <yan.y.z...@intel.com> writes:

> On Tue, Sep 10, 2024 at 11:44:10PM +0000, Ackerley Tng wrote:
>> +/*
>> + * Allocates and then caches a folio in the filemap. Returns a folio with
>> + * refcount of 2: 1 after allocation, and 1 taken by the filemap.
>> + */
>> +static struct folio *kvm_gmem_hugetlb_alloc_and_cache_folio(struct inode 
>> *inode,
>> +                                                        pgoff_t index)
>> +{
>> +    struct kvm_gmem_hugetlb *hgmem;
>> +    pgoff_t aligned_index;
>> +    struct folio *folio;
>> +    int nr_pages;
>> +    int ret;
>> +
>> +    hgmem = kvm_gmem_hgmem(inode);
>> +    folio = kvm_gmem_hugetlb_alloc_folio(hgmem->h, hgmem->spool);
>> +    if (IS_ERR(folio))
>> +            return folio;
>> +
>> +    nr_pages = 1UL << huge_page_order(hgmem->h);
>> +    aligned_index = round_down(index, nr_pages);
> Maybe a gap here.
>
> When a guest_memfd is bound to a slot where slot->base_gfn is not aligned to
> 2M/1G and slot->gmem.pgoff is 0, even if an index is 2M/1G aligned, the
> corresponding GFN is not 2M/1G aligned.

Thanks for looking into this.

In 1G page support for guest_memfd, the offset and size are always
hugepage aligned to the hugepage size requested at guest_memfd creation
time, and it is true that when binding to a memslot, slot->base_gfn and
slot->npages may not be hugepage aligned.

>
> However, TDX requires that private huge pages be 2M aligned in GFN.
>

IIUC other factors also contribute to determining the mapping level in
the guest page tables, like lpage_info and .private_max_mapping_level()
in kvm_x86_ops.

If slot->base_gfn and slot->npages are not hugepage aligned, lpage_info
will track that and not allow faulting into guest page tables at higher
granularity.

Hence I think it is okay to leave it to KVM to fault pages into the
guest correctly. For guest_memfd will just maintain the invariant that
offset and size are hugepage aligned, but not require that
slot->base_gfn and slot->npages are hugepage aligned. This behavior will
be consistent with other backing memory for guests like regular shmem or
HugeTLB.

>> +    ret = kvm_gmem_hugetlb_filemap_add_folio(inode->i_mapping, folio,
>> +                                             aligned_index,
>> +                                             htlb_alloc_mask(hgmem->h));
>> +    WARN_ON(ret);
>> +
>>      spin_lock(&inode->i_lock);
>>      inode->i_blocks += blocks_per_huge_page(hgmem->h);
>>      spin_unlock(&inode->i_lock);
>>  
>> -    return page_folio(requested_page);
>> +    return folio;
>> +}

Reply via email to