Yan Zhao <yan.y.z...@intel.com> writes: > On Tue, Sep 10, 2024 at 11:44:10PM +0000, Ackerley Tng wrote: >> +/* >> + * Allocates and then caches a folio in the filemap. Returns a folio with >> + * refcount of 2: 1 after allocation, and 1 taken by the filemap. >> + */ >> +static struct folio *kvm_gmem_hugetlb_alloc_and_cache_folio(struct inode >> *inode, >> + pgoff_t index) >> +{ >> + struct kvm_gmem_hugetlb *hgmem; >> + pgoff_t aligned_index; >> + struct folio *folio; >> + int nr_pages; >> + int ret; >> + >> + hgmem = kvm_gmem_hgmem(inode); >> + folio = kvm_gmem_hugetlb_alloc_folio(hgmem->h, hgmem->spool); >> + if (IS_ERR(folio)) >> + return folio; >> + >> + nr_pages = 1UL << huge_page_order(hgmem->h); >> + aligned_index = round_down(index, nr_pages); > Maybe a gap here. > > When a guest_memfd is bound to a slot where slot->base_gfn is not aligned to > 2M/1G and slot->gmem.pgoff is 0, even if an index is 2M/1G aligned, the > corresponding GFN is not 2M/1G aligned.
Thanks for looking into this. In 1G page support for guest_memfd, the offset and size are always hugepage aligned to the hugepage size requested at guest_memfd creation time, and it is true that when binding to a memslot, slot->base_gfn and slot->npages may not be hugepage aligned. > > However, TDX requires that private huge pages be 2M aligned in GFN. > IIUC other factors also contribute to determining the mapping level in the guest page tables, like lpage_info and .private_max_mapping_level() in kvm_x86_ops. If slot->base_gfn and slot->npages are not hugepage aligned, lpage_info will track that and not allow faulting into guest page tables at higher granularity. Hence I think it is okay to leave it to KVM to fault pages into the guest correctly. For guest_memfd will just maintain the invariant that offset and size are hugepage aligned, but not require that slot->base_gfn and slot->npages are hugepage aligned. This behavior will be consistent with other backing memory for guests like regular shmem or HugeTLB. >> + ret = kvm_gmem_hugetlb_filemap_add_folio(inode->i_mapping, folio, >> + aligned_index, >> + htlb_alloc_mask(hgmem->h)); >> + WARN_ON(ret); >> + >> spin_lock(&inode->i_lock); >> inode->i_blocks += blocks_per_huge_page(hgmem->h); >> spin_unlock(&inode->i_lock); >> >> - return page_folio(requested_page); >> + return folio; >> +}