Yan Zhao writes:
> On Tue, May 06, 2025 at 12:22:47PM -0700, Ackerley Tng wrote:
>> Yan Zhao writes:
>>
>> >> >
>> >> >
>> >> > What options does userspace have in this scenario?
>> >> > It can't reduce the f
Yan Zhao writes:
>> >
>> >
>> > What options does userspace have in this scenario?
>> > It can't reduce the flag to KVM_GUEST_MEMFD_HUGE_2MB. Adjusting the
>> > gmem.pgoff
>> > isn't ideal either.
>> >
>> > What about something similar as below?
>> >
>> > diff --git a/virt/kvm/guest_memfd.c b/v
Sean Christopherson writes:
> On Mon, Aug 07, 2023, Ackerley Tng wrote:
>> KVM_LINK_GUEST_MEMFD will link a gmem fd's underlying inode to a new
>> file (and fd).
>>
>> Signed-off-by: Ackerley Tng
>> ---
>> include/uapi/linux/kvm.h
Yan Zhao writes:
> On Fri, Apr 25, 2025 at 03:45:20PM -0700, Ackerley Tng wrote:
>> Yan Zhao writes:
>>
>> > On Thu, Apr 24, 2025 at 11:15:11AM -0700, Ackerley Tng wrote:
>> >> Vishal Annapurve writes:
>> >>
>> >> > On Thu, Ap
Yan Zhao writes:
> On Thu, Apr 24, 2025 at 11:15:11AM -0700, Ackerley Tng wrote:
>> Vishal Annapurve writes:
>>
>> > On Thu, Apr 24, 2025 at 1:15 AM Yan Zhao wrote:
>> >>
>> >> On Thu, Apr 24, 2025 at 01:55:51PM +0800, Chenyi Qiang wrote:
&g
+0800, Yan Zhao wrote:
>> > >> On Wed, Apr 23, 2025 at 03:02:02PM -0700, Ackerley Tng wrote:
>> > >>> Yan Zhao writes:
>> > >>>
>> > >>>> On Tue, Sep 10, 2024 at 11:44:10PM +, Ackerley Tng wrote:
>> > >>&g
Peter Xu writes:
> On Tue, Sep 10, 2024 at 11:43:57PM +0000, Ackerley Tng wrote:
>> @@ -1079,12 +1152,20 @@ static struct inode
>> *kvm_gmem_inode_make_secure_inode(const char *name,
>> if (err)
>> goto out;
>>
>> +err = -ENOMEM;
&g
Yan Zhao writes:
> On Tue, Sep 10, 2024 at 11:44:10PM +0000, Ackerley Tng wrote:
>> +/*
>> + * Allocates and then caches a folio in the filemap. Returns a folio with
>> + * refcount of 2: 1 after allocation, and 1 taken by the filemap.
>> +
Yan Zhao writes:
> On Tue, Sep 10, 2024 at 11:43:58PM +0000, Ackerley Tng wrote:
>> guest_memfd files can always be mmap()ed to userspace, but
>> faultability is controlled by an attribute on the inode.
>>
>> Co-developed-by: Fuad Tabba
>> Signed-off-b
Peter Xu writes:
> On Tue, Sep 10, 2024 at 11:43:58PM +0000, Ackerley Tng wrote:
>> @@ -790,6 +791,9 @@ static long kvm_gmem_punch_hole(struct inode *inode,
>> loff_t offset, loff_t len)
>> */
>> filemap_invalidate_lock(inode->i_mapping);
>>
ur reviews!
>
> On Tue, Sep 10, 2024 at 11:43:44PM +, Ackerley Tng wrote:
>> +static void kvm_gmem_init_mount(void)
>>
>> +{
&
Shivank Garg writes:
> Previously, guest-memfd allocations followed local NUMA node id in absence
> of process mempolicy, resulting in arbitrary memory allocation.
> Moreover, mbind() couldn't be used since memory wasn't mapped to userspace
> in the VMM.
>
> Enable NUMA policy support by implemen
Christoph Hellwig writes:
> On Tue, Apr 08, 2025 at 11:23:59AM +, Shivank Garg wrote:
>> From: Ackerley Tng
>>
>> Using guest mem inodes allows us to store metadata for the backing
>> memory on the inode. Metadata will be added in a later patch to support
>&g
Christian Brauner writes:
> On Mon, Apr 07, 2025 at 04:46:48PM +0200, David Hildenbrand wrote:
>
>
>
> Fwiw, b4 allows to specify dependencies so you can b4 shazam/am and it
> will pull in all prerequisite patches:
>
> b4 prep --edit-deps Edit the series dependencies in your defined
>
Ackerley Tng writes:
> Peter Xu writes:
>
>> On Tue, Sep 10, 2024 at 11:43:45PM +0000, Ackerley Tng wrote:
>>> +/**
>>> + * Removes folios in range [@lstart, @lend) from page cache of inode,
>>> updates
>>> + * inode metadata and
Peter Xu writes:
> On Tue, Sep 10, 2024 at 11:43:45PM +0000, Ackerley Tng wrote:
>> +/**
>> + * Removes folios in range [@lstart, @lend) from page cache of inode,
>> updates
>> + * inode metadata and hugetlb reservations.
>> + */
>> +static void kvm_gm
Peter Xu writes:
> On Tue, Sep 10, 2024 at 11:43:46PM +0000, Ackerley Tng wrote:
>> +static struct folio *kvm_gmem_hugetlb_alloc_folio(struct hstate *h,
>> + struct hugepage_subpool
>> *spool)
>> +{
>> +
Jun Miao writes:
> Hi Ackerley,
> Due to actual customer requirements(such as ByteDance), I have added
> support for NUMA policy based on your foundation.
> Standing on the shoulders of giants, please correct me if there is
> anyting wrong.
>
> --- Thanks Jun.miao
>
>
Hi Jun,
Thank you for y
Amit Shah writes:
>>
>>
>> Thanks all your help and comments during the guest_memfd upstream
>> calls,
>> and thanks for the help from AMD.
>>
>> Extending mmap() support from Fuad with 1G page support introduces
>> more
>> states that made it more complicated (at least for me).
>>
>> I'm mod
Amit Shah writes:
> Hey Ackerley,
Hi Amit,
> On Tue, 2024-09-10 at 23:43 +, Ackerley Tng wrote:
>> Hello,
>>
>> This patchset is our exploration of how to support 1G pages in
>> guest_memfd, and
>> how the pages will be used in Confidential VMs.
>
Patrick Roy writes:
> On Tue, 2024-10-08 at 20:56 +0100, Sean Christopherson wrote:
>> On Tue, Oct 08, 2024, Ackerley Tng wrote:
>>> Patrick Roy writes:
>>>> For the "non-CoCo with direct map entries removed" VMs that we at AWS
>>>> are go
Peter Xu writes:
> On Fri, Oct 11, 2024 at 11:32:11PM +0000, Ackerley Tng wrote:
>> Peter Xu writes:
>>
>> > On Tue, Sep 10, 2024 at 11:43:57PM +, Ackerley Tng wrote:
>> >> The faultability xarray is stored on the inode since faultability is a
>&g
Peter Xu writes:
> On Tue, Sep 10, 2024 at 11:43:57PM +0000, Ackerley Tng wrote:
>> The faultability xarray is stored on the inode since faultability is a
>> property of the guest_memfd's memory contents.
>>
>> In this RFC, presence of an entry in the xarray
Patrick Roy writes:
> Hi Ackerley,
>
> On Thu, 2024-10-03 at 22:32 +0100, Ackerley Tng wrote:
>> Elliot Berman writes:
>>
>>> On Tue, Sep 10, 2024 at 11:44:01PM +, Ackerley Tng wrote:
>>>> Since guest_memfd now supports mmap(), folios have to be
Ackerley Tng writes:
> Elliot Berman writes:
>
>> On Tue, Sep 10, 2024 at 11:44:01PM +0000, Ackerley Tng wrote:
>>> Since guest_memfd now supports mmap(), folios have to be prepared
>>> before they are faulted into userspace.
>>>
>>> When m
Elliot Berman writes:
> On Tue, Sep 10, 2024 at 11:44:01PM +0000, Ackerley Tng wrote:
>> Since guest_memfd now supports mmap(), folios have to be prepared
>> before they are faulted into userspace.
>>
>> When memory attributes are switched between shared and private, t
Elliot Berman writes:
> On Tue, Sep 10, 2024 at 11:43:46PM +0000, Ackerley Tng wrote:
>> If HugeTLB is requested at guest_memfd creation time, HugeTLB pages
>> will be used to back guest_memfd.
>>
>> Signed-off-by: Ackerley Tng
>>
>>
>>
>>
Vishal Annapurve writes:
> On Wed, Sep 11, 2024 at 1:44 AM Ackerley Tng wrote:
>>
>> ...
>> +}
>> +
>> +static void kvm_gmem_evict_inode(struct inode *inode)
>> +{
>> + u64 flags = (u64)inode->i_private;
>
regardless of
faultability (as long as a HugeTLB page's worth of pages is
truncated).
Co-developed-by: Vishal Annapurve
Signed-off-by: Vishal Annapurve
Co-developed-by: Ackerley Tng
Signed-off-by: Ackerley Tng
---
virt/kvm/guest_memfd.c | 678 +++--
1
Signed-off-by: Ackerley Tng
---
.../kvm/x86_64/private_mem_conversions_test.c | 146 +++---
.../x86_64/private_mem_conversions_test.sh| 3 +
2 files changed, 124 insertions(+), 25 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64/private_mem_conversions_test.c
b
A contiguous GPA range may not be contiguous in HVA.
This helper performs madvise, given a GPA range, by madvising in
blocks according to memslot configuration.
Signed-off-by: Ackerley Tng
---
tools/include/linux/kernel.h | 4 +--
.../testing/selftests/kvm/include/kvm_util.h
nctions in
vm_mem_add() to flexibly build up struct userspace_mem_region before
finally adding the region to the vm with vm_mem_region_add().
Signed-off-by: Ackerley Tng
---
.../testing/selftests/kvm/include/kvm_util.h | 29 +-
.../testing/selftests/kvm/include/test_util.h | 2 +
tools/testing
CONFIG_GUP_TEST provides userspace with an ioctl to invoke
pin_user_pages(), and this test uses the ioctl to pin pages, to check
that memory attributes cannot be set to private if shared pages are
pinned.
Signed-off-by: Ackerley Tng
---
tools/testing/selftests/kvm/Makefile | 1
Minimal test for guest_memfd to test that when memory is marked shared
in a VM, the host can read and write to it via an mmap()ed address,
and the guest can also read and write to it.
Signed-off-by: Ackerley Tng
---
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm
Note in comments why madvise() is not needed before setting memory to
private.
Signed-off-by: Ackerley Tng
---
.../selftests/kvm/x86_64/private_mem_kvm_exits_test.c | 11 +--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64
s are still mapped, setting memory attributes
will fail.
5. Test that madvise(MADV_REMOVE) can be used to remove pages from
guest_memfd, forcing zeroing of those pages before the next time
the pages are faulted in.
Signed-off-by: Ackerley Tng
---
.../testing/selftests/kvm/guest_memfd_t
No functional change intended.
Signed-off-by: Ackerley Tng
---
tools/testing/selftests/kvm/include/kvm_util.h | 14 +++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h
b/tools/testing/selftests/kvm/include/kvm_util.h
h this call will get a SIGBUS.
Co-developed-by: Ackerley Tng
Signed-off-by: Ackerley Tng
Co-developed-by: Vishal Annapurve
Signed-off-by: Vishal Annapurve
---
include/linux/kvm_host.h | 1 +
virt/kvm/guest_memfd.c | 207 +++
virt/kvm/kvm_main.c
an be used to mark whether the folio is ready for shared OR
private use.
Signed-off-by: Ackerley Tng
---
virt/kvm/guest_memfd.c | 131 -
virt/kvm/kvm_main.c| 2 +
virt/kvm/kvm_mm.h | 7 +++
3 files changed, 139 insertions(+), 1 deletion(-)
off-by: Ackerley Tng
---
virt/kvm/guest_memfd.c | 22 ++
1 file changed, 22 insertions(+)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index fc2483e35876..1d4dfe0660ad 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -1256,6 +1256,23 @@ static struc
.
Also store struct kvm_gmem_hugetlb in struct kvm_gmem_hugetlb as a
pointer. inode->i_mapping->i_private_data.
Co-developed-by: Fuad Tabba
Signed-off-by: Fuad Tabba
Co-developed-by: Ackerley Tng
Signed-off-by: Ackerley Tng
Co-developed-by: Vishal Annapurve
Signed-off-by: Vishal Annapurve
-
guest_memfd files can always be mmap()ed to userspace, but
faultability is controlled by an attribute on the inode.
Co-developed-by: Fuad Tabba
Signed-off-by: Fuad Tabba
Co-developed-by: Ackerley Tng
Signed-off-by: Ackerley Tng
---
virt/kvm/guest_memfd.c | 46
to build page splitting/merging
functionality before allowing guest_memfd files to be mmap()ed.
Co-developed-by: Ackerley Tng
Signed-off-by: Ackerley Tng
Co-developed-by: Vishal Annapurve
Signed-off-by: Vishal Annapurve
---
virt/kvm/guest_memfd.c | 299
These functions are introduced in hugetlb.c so the private
hugetlb_lock can be accessed.
hugetlb_lock is reused for this PoC, but a separate lock should be
used in a future revision to avoid interference due to hash collisions
with HugeTLB's usage of this lock.
Co-developed-by: Ackerle
These functions will be used by guest_memfd to split/reconstruct
HugeTLB pages.
Co-developed-by: Ackerley Tng
Signed-off-by: Ackerley Tng
Co-developed-by: Vishal Annapurve
Signed-off-by: Vishal Annapurve
---
include/linux/hugetlb.h | 15 +++
mm/hugetlb.c| 8
These functions will need to be used by guest_memfd when
splitting/reconstructing HugeTLB pages.
Co-developed-by: Ackerley Tng
Signed-off-by: Ackerley Tng
Co-developed-by: Vishal Annapurve
Signed-off-by: Vishal Annapurve
---
include/linux/hugetlb.h | 14 ++
mm/hugetlb_vmemmap.h
Using HugeTLB as the huge page allocator for guest_memfd allows reuse
of HugeTLB's reporting mechanism.
Signed-off-by: Ackerley Tng
---
tools/testing/selftests/kvm/Makefile | 1 +
.../kvm/guest_memfd_hugetlb_reporting_test.c | 222 ++
2 files changed, 223 inser
Add private_mem_conversions_test.sh to automate testing of different
combinations of private_mem_conversions_test.
Signed-off-by: Ackerley Tng
---
.../x86_64/private_mem_conversions_test.sh| 88 +++
1 file changed, 88 insertions(+)
create mode 100755
tools/testing
Update private_mem_conversions_test for various private memory backing
source types.
Signed-off-by: Ackerley Tng
---
.../kvm/x86_64/private_mem_conversions_test.c | 28 ++-
1 file changed, 21 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64
Adds support for various type of backing sources for private
memory (in the sense of confidential computing), similar to the
backing sources available for shared memory.
Signed-off-by: Ackerley Tng
---
.../testing/selftests/kvm/include/test_util.h | 16
tools/testing/selftests/kvm/lib
Add tests for 2MB and 1GB page sizes, and update the invalid flags
test for the new KVM_GUEST_MEMFD_HUGETLB flag.
Signed-off-by: Ackerley Tng
---
.../testing/selftests/kvm/guest_memfd_test.c | 45 ++-
1 file changed, 35 insertions(+), 10 deletions(-)
diff --git a/tools
When a hugetlb guest_memfd is requested, the requested size should be
aligned to the size of the hugetlb page requested.
Signed-off-by: Ackerley Tng
---
virt/kvm/guest_memfd.c | 15 ++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm
If HugeTLB is requested at guest_memfd creation time, HugeTLB pages
will be used to back guest_memfd.
Signed-off-by: Ackerley Tng
---
virt/kvm/guest_memfd.c | 252 ++---
1 file changed, 239 insertions(+), 13 deletions(-)
diff --git a/virt/kvm/guest_memfd.c b
First stage of hugetlb support: add initialization and cleanup
routines.
After guest_mem was massaged to use guest_mem inodes instead of
anonymous inodes in an earlier patch, the .evict_inode handler can now
be overridden to do hugetlb metadata cleanup.
Signed-off-by: Ackerley Tng
---
include
, and
metadata about backing memory is not unique to a specific binding and
struct kvm.
Signed-off-by: Ackerley Tng
---
include/uapi/linux/magic.h | 1 +
virt/kvm/guest_memfd.c | 119 ++---
2 files changed, 100 insertions(+), 20 deletions(-)
diff --git a/include
This will used by guest_memfd in a later patch.
Signed-off-by: Ackerley Tng
---
fs/hugetlbfs/inode.c| 33 +
include/linux/hugetlb.h | 3 +++
mm/hugetlb.c| 21 +
3 files changed, 29 insertions(+), 28 deletions(-)
diff --git a
This will used by guest_memfd in a later patch.
Signed-off-by: Ackerley Tng
---
include/linux/hugetlb.h | 2 ++
mm/hugetlb.c| 4 ++--
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 9ef1adbd3207..4d47bf94c211
__hugetlb_acct_memory() today does more than just memory
accounting. when there's insufficient HugeTLB pages,
__hugetlb_acct_memory() will attempt to get surplus pages.
This change adds a flag to disable getting surplus pages if there are
insufficient HugeTLB pages.
Signed-off-by: Ackerle
This will allow hugetlb subpools to be used by guest_memfd.
Signed-off-by: Ackerley Tng
---
include/linux/hugetlb.h | 3 +++
mm/hugetlb.c| 6 ++
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index e4a05a421623
This will allow preparation steps to be shared
Signed-off-by: Ackerley Tng
---
include/linux/mm.h | 1 +
mm/truncate.c | 26 --
2 files changed, 17 insertions(+), 10 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c4b238a20b76..ffb4788295b4
hugetlb_alloc_folio() allocates a hugetlb folio without handling
reservations in the vma and subpool, since some of that reservation
concepts are hugetlbfs specific.
Signed-off-by: Ackerley Tng
---
include/linux/hugetlb.h | 12
mm/hugetlb.c| 144
Reduce dependence on vma since the use of huge_node() assumes
that the mempolicy is stored in a specific place in the inode,
accessed via the vma.
Signed-off-by: Ackerley Tng
---
mm/hugetlb.c | 55 ++--
1 file changed, 23 insertions(+), 32
Reducing dependence on vma avoids the hugetlb-specific assumption of
where the mempolicy is stored. This will open up other ways of using
hugetlb.
Signed-off-by: Ackerley Tng
---
mm/hugetlb.c | 37 +++--
1 file changed, 23 insertions(+), 14 deletions(-)
diff
o the current node
id, which was not previously enforced.
alloc_pages_mpol_noprof() is the last remaining direct user of
policy_nodemask(). All its callers begin with nid being the current
node id as well. More refactoring is required for to simplify that.
Signed-off-by: Ackerley Tng
---
in
If avoid_reserve is true, gbl_chg is not used anyway, so there is no
point in setting gbl_chg.
Signed-off-by: Ackerley Tng
---
mm/hugetlb.c | 10 --
1 file changed, 10 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 597102ed224b..5cf7fb117e9d 100644
--- a/mm/hugetlb.c
+++ b
!available_huge_pages(h)
can be combined into
(avoid_reserve || !vma_has_reserves(vma, chg))
&& !available_huge_pages(h).
Applying de Morgan's theorem on
avoid_reserve || !vma_has_reserves(vma, chg)
yields
!avoid_reserve && vma_has_reserves(vma, chg),
hence the simplifica
ng also takes into account the allocation's request
parameter avoid_reserve, which helps to further simplify the calling
function alloc_hugetlb_folio().
Signed-off-by: Ackerley Tng
---
mm/hugetlb.c | 16 +---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/mm/hu
private conversion flow
+ Requiring user to request kernel to unmap pages from userspace using
madvise(MADV_DONTNEED)
+ Failing conversion on elevated mapcounts/pincounts/refcounts
+ Process of splitting/reconstructing page
+ Anything else!
[1]
https://lore.kernel.org/all/20240829-gu
68 matches
Mail list logo