On Thu, Apr 30, 2026 at 05:28:17PM +0100, Kiryl Shutsemau wrote: > sashiko.dev -- > https://sashiko.dev/#/patchset/[email protected] -- > wrote: > > > @@ -1084,9 +1092,29 @@ static __always_inline void > > > __copy_present_ptes(struct vm_area_struct *dst_vma, > > > pte_t pte, unsigned long addr, int nr) > > > { > > > struct mm_struct *src_mm = src_vma->vm_mm; > > > + bool writable; > > > + > > > + /* > > > + * Snapshot writability before the RWP-disarm rewrite below: when the > > > + * child is not RWP-armed, pte_modify(pte, dst_vma->vm_page_prot) can > > > + * silently drop _PAGE_RW from a resolved (no-marker) writable PTE, > > > + * so a later pte_write(pte) check would skip the COW wrprotect and > > > + * leave the parent writable over a folio shared with the child. > > > + */ > > > + writable = pte_write(pte); > > > + > > > + /* > > > + * Child is not RWP-armed: restore accessible protection so the > > > + * inherited PAGE_NONE does not cost a fault on first read. > > > + */ > > > + if (!userfaultfd_protected(dst_vma)) { > > > + if (userfaultfd_rwp(src_vma)) > > > + pte = pte_modify(pte, dst_vma->vm_page_prot); > > > + pte = pte_clear_uffd(pte); > > > + } > > Does this unconditional pte_modify() create invalid clean and writable PTEs > > for shared mappings? > > > > Without checking pte_uffd(pte) first, this blindly modifies every present > > PTE > > if the source VMA had RWP enabled. For shared writable mappings, > > vm_page_prot > > includes _PAGE_RW. If a PTE was clean and mapped read-only to intercept the > > first write for filesystem dirty-tracking, pte_modify() forces the write bit > > on while preserving the clean state. > > Two reasons this is safe in practice: > > 1. RWP cannot reach a backing that uses the clean-RO + page_mkwrite() > dirty-tracking pattern. vma_can_userfault() admits RWP only on anon, > shmem and hugetlbfs (the ones with vm_uffd_ops); regular file-backed > shared mappings are rejected at register time. > > 2. For backings that *do* support shared writable + dirty tracking (i.e. > the ones RWP isn't enabled on), the fs sets vma->vm_page_prot to RO > precisely so the first write traps page_mkwrite(). pte_modify(pte, > dst_vma->vm_page_prot) would not add _PAGE_RW in that configuration > either. > > > Hardware writes could then bypass page_mkwrite(), leaving the filesystem > > unaware that the page is dirty, and potentially causing silent data loss > > upon reclaim. > > Could this also overwrite other important PTE states like NUMA hinting > > or soft-dirty tracking? > > pte_modify() on the supported architectures preserves _PAGE_CHG_MASK, > which covers _PAGE_DIRTY, _PAGE_ACCESSED, _PAGE_SOFT_DIRTY, and the PFN. > NUMA hinting is encoded in the protection bits, so a transient hint > state on a resolved (no-marker) PTE would be lost across this rewrite, > but that's just a re-prime on the next NUMA scan, not a correctness > issue. > > > @@ -298,6 +298,16 @@ static __always_inline void change_present_ptes(struct > > mmu_gather *tlb, > > else if (uffd_wp_resolve || uffd_rwp_resolve) > > ptent = pte_clear_uffd(ptent); > > > > + /* > > + * The uffd bit on a VM_UFFD_RWP VMA carries PROT_NONE > > + * semantics. If mprotect() or NUMA hinting changed the > > + * base protection, restore PAGE_NONE so the PTE still > > + * traps on any access. pte_modify() preserves > > + * _PAGE_UFFD. > > + */ > > + if (userfaultfd_rwp(vma) && pte_uffd(ptent)) > > + ptent = pte_modify(ptent, PAGE_NONE); > > > > For VMAs registered with both UFFD WP and RWP, could this silently upgrade > > Write-Protected pages to Read-Write-Protected? > > VM_UFFD_WP and VM_UFFD_RWP are mutually exclusive per VMA. The combination > is rejected at registration time: > > fs/userfaultfd.c, userfaultfd_register(): > if ((vm_flags & VM_UFFD_WP) && (vm_flags & VM_UFFD_RWP)) > goto out; /* -EINVAL */ > > and re-registration cannot drop either bit (also -EBUSY in the same > function), so a PTE in a WP-armed VMA can never satisfy > userfaultfd_rwp(vma); the new branch never fires there. > > [ ... ] > > + /* Re-arm RWP on the moved PTE if dst_vma is RWP-registered. */ > > + if (userfaultfd_rwp(dst_vma)) { > > + orig_dst_pte = pte_modify(orig_dst_pte, PAGE_NONE); > > + orig_dst_pte = pte_mkuffd(orig_dst_pte); > > + } > > + > > set_pte_at(mm, dst_addr, dst_pte, orig_dst_pte); > > > > Could applying PAGE_NONE unconditionally to newly moved pages trap > > applications in an infinite fault loop? > > No -- the post-MOVE access is delivered as a normal RWP fault. In sync > mode it goes to the registered handler, which resolves it with > UFFDIO_RWPROTECT clearing MODE_RWP; in async mode the kernel resolves > it in-kernel and the faulting thread continues. There is no loop. > > The semantics here are intentional: a VM_UFFD_RWP VMA has the contract > that every present PTE is either an active marker or a tracked-and- > resolved PTE whose next access will re-trap. UFFDIO_MOVE into such a > VMA must keep that contract, otherwise the moved-in page would be a > silent hole in the working-set view. UFFDIO_MOVE has no mode flag for > "skip protection", by design -- the same way it has no flag to skip > WP arming if dst_vma were WP-armed (and the equivalent could be added > there if we ever decide UFFDIO_MOVE should preserve markers in WP > VMAs too). >
Oopsie. I put it in reply to the wrong patch. It suppose to be for 06/14. -- Kiryl Shutsemau / Kirill A. Shutemov

