sashiko.dev --
https://sashiko.dev/#/patchset/[email protected] -- wrote:
> + if (enable_rwp)
> + mm_cp_flags = MM_CP_UFFD_RWP;
> + else
> + mm_cp_flags = MM_CP_UFFD_RWP_RESOLVE | MM_CP_TRY_CHANGE_WRITABLE;
>
> Does this unconditionally apply MM_CP_TRY_CHANGE_WRITABLE to the entire range?
Confirmed bug. RWP registration does not require VM_WRITE, so
mmap(PROT_READ)
UFFDIO_REGISTER(MODE_RWP)
UFFDIO_RWPROTECT(<range>, disable)
trips WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE)) inside
maybe_change_pte_writable() on resolve.
The flag belongs inside the iteration, gated on
vma_wants_manual_pte_write_upgrade(vma), matching mwriteprotect_range()
(mm/userfaultfd.c:1003) and userfaultfd_clear_vma() (mm/userfaultfd.c:2182):
for_each_vma_range(vmi, dst_vma, end) {
unsigned long vma_start = max(dst_vma->vm_start, start);
unsigned long vma_end = min(dst_vma->vm_end, end);
unsigned int flags = mm_cp_flags;
if (!enable_rwp && vma_wants_manual_pte_write_upgrade(dst_vma))
flags |= MM_CP_TRY_CHANGE_WRITABLE;
change_protection(&tlb, dst_vma, vma_start, vma_end, flags);
}
Will fold for v2.
> Since change_protection() walks and modifies page tables here, does this
> need to call vma_start_write(vma) first?
No.
This is the same locking pattern as the pre-existing uffd_wp_range() call
that the hunk replaces -- mmap_write_lock without vma_start_write(), which
remains safe for the same reasons:
- mmap_write_lock excludes anything taking mmap_read_lock, including
MADV_DONTNEED and the other PTE-page-freeing paths (try_to_free_pte()
runs under mmap_read_lock + per-VMA locking, never standalone).
- The remaining concurrent reader is the per-VMA-locked page fault
(lock_vma_under_rcu()), which walks PTEs under the PTE lock.
change_protection() also takes the PTE lock when updating, so the
two serialise. A fault that observes a transient pre-resolve PTE
just produces a normal fault delivery and resolves correctly.
--
Kiryl Shutsemau / Kirill A. Shutemov