The commit is pushed to "branch-rh9-5.14.0-427.44.1.vz9.80.x-ovz" and will appear at g...@bitbucket.org:openvz/vzkernel.git after rh9-5.14.0-427.44.1.vz9.80.5 ------> commit a8cc8c6ac35ccc81b8c16425596cac77df058dee Author: Anthony Yznaga <anthony.yzn...@oracle.com> Date: Thu Jan 26 15:41:44 2023 -0800
oracle/mm: avoid early cow when copying ptes for MADV_DOEXEC When a VMA preserved via MADV_DOEXEC is copied to the new mm during exec, copy_page_range() is called to copy the pagetable entries. Commit 70e806e4 ("mm: Do early cow for pinned pages during fork() for ptes") changed how pinned pages encountered by copy_page_range() are handled. A copy of the page is made immediately rather than write-protecting it for later COW. This breaks MADV_DOEXEC when the memory to preserve is pinned (e.g. the guest memory of a VFIO-enabled guest. Ensure that this page copying will not be done when copying pagetable entries for preservation by adding a check for VM_EXEC_KEEP. Orabug: 35054621 Signed-off-by: Anthony Yznaga <anthony.yzn...@oracle.com> Reviewed-by: Liam R. Howlett <liam.howl...@oracle.com> https://virtuozzo.atlassian.net/browse/VSTOR-96305 Porting notes: RedHat has applied rh commit: d8f21270d397 ("mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and page_try_dup_anon_rmap()") ms commit: fb3d824d1a46 ("mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and page_try_dup_anon_rmap()") and rh commit: 85f85f728ec6 ("mm/memory: slightly simplify copy_present_pte()") ms commit: b51ad4f8679e ("mm/memory: slightly simplify copy_present_pte()") So the check from the copy_present_page() has been moved to copy_present_pte(). (cherry picked from Oracle commit a904d4d4c24126a64b6d8aa0658425f4964ce674) Signed-off-by: Konstantin Khorenko <khore...@virtuozzo.com> Feature: oracle/mm: MADV_DOEXEC madvise() flag --- mm/memory.c | 6 +++++- mm/mmap.c | 10 +++++++--- 2 files changed, 12 insertions(+), 4 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 88b1aead060f..ebd08a1f2c9a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -915,9 +915,12 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, unsigned long vm_flags = src_vma->vm_flags; pte_t pte = *src_pte; struct page *page; + bool is_exec_keep; page = vm_normal_page(src_vma, addr, pte); if (page && PageAnon(page)) { + is_exec_keep = dst_vma->vm_flags & VM_EXEC_KEEP ? true : false; + /* * If this page may have been pinned by the parent process, * copy the page immediately for the child so that we'll always @@ -925,7 +928,8 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, * future. */ get_page(page); - if (unlikely(page_try_dup_anon_rmap(page, false, src_vma))) { + if (unlikely(page_try_dup_anon_rmap(page, false, src_vma)) && + !is_exec_keep) { /* Page maybe pinned, we have to copy. */ put_page(page); return copy_present_page(dst_vma, src_vma, dst_pte, src_pte, diff --git a/mm/mmap.c b/mm/mmap.c index f87d284bd17b..9bb2382d9101 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3300,10 +3300,11 @@ int vma_dup(struct vm_area_struct *old_vma, struct mm_struct *mm) /* * Clear functionality that should not carry over to the new - * process.any memory locking, userfaultfd, and preservation over - * exec flags. + * process. Note that VM_EXEC_KEEP is cleared later to allow + * code called by copy_page_range to infer that the copying is + * for preserving over exec and not for process forking. */ - vma->vm_flags &= ~(VM_LOCKED|VM_LOCKONFAULT|VM_UFFD_MISSING|VM_UFFD_WP|VM_EXEC_KEEP); + vma->vm_flags &= ~(VM_LOCKED|VM_LOCKONFAULT|VM_UFFD_MISSING|VM_UFFD_WP); vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; __insert_vm_struct(mm, vma); @@ -3318,6 +3319,9 @@ int vma_dup(struct vm_area_struct *old_vma, struct mm_struct *mm) old_vma->vm_flags &= ~VM_ACCOUNT; ret = copy_page_range(vma, old_vma); + + vma->vm_flags &= ~VM_EXEC_KEEP; + return ret; fail_nomem_anon_vma_fork: _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel