On Mon, Feb 17, 2025 at 07:08:28PM +0000, Matthew Wilcox (Oracle) wrote:

Hi Matthew,

> If the first access to a folio is a read that is then followed by a
> write, we can save a page fault.  s390 implemented this in their
> mk_pte() in commit abf09bed3cce ("s390/mm: implement software dirty
> bits"), but other architectures can also benefit from this.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <wi...@infradead.org>
> ---
>  arch/s390/include/asm/pgtable.h | 7 +------
>  mm/memory.c                     | 2 ++
>  2 files changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index 3ca5af4cfe43..3ee495b5171e 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -1451,12 +1451,7 @@ static inline pte_t mk_pte_phys(unsigned long 
> physpage, pgprot_t pgprot)
>  
>  static inline pte_t mk_pte(struct page *page, pgprot_t pgprot)
>  {
> -     unsigned long physpage = page_to_phys(page);
> -     pte_t __pte = mk_pte_phys(physpage, pgprot);
> -
> -     if (pte_write(__pte) && PageDirty(page))
> -             __pte = pte_mkdirty(__pte);
> -     return __pte;
> +     return mk_pte_phys(page_to_phys(page), pgprot);
>  }

With the above the implicit dirtifying of hugetlb PTEs (as result of
mk_huge_pte() -> mk_pte()) in make_huge_pte() is removed:

static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,
                bool try_mkwrite)
{
        ...
        if (try_mkwrite && (vma->vm_flags & VM_WRITE)) {
                entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_huge_pte(page,
                                         vma->vm_page_prot)));
        } else {
                entry = huge_pte_wrprotect(mk_huge_pte(page,
                                           vma->vm_page_prot));
        }
        ...
}

What is your take on this?

>  #define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD-1))
> diff --git a/mm/memory.c b/mm/memory.c
> index 539c0f7c6d54..4330560eee55 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5124,6 +5124,8 @@ void set_pte_range(struct vm_fault *vmf, struct folio 
> *folio,
>  
>       if (write)
>               entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> +     else if (pte_write(entry) && folio_test_dirty(folio))
> +             entry = pte_mkdirty(entry);
>       if (unlikely(vmf_orig_pte_uffd_wp(vmf)))
>               entry = pte_mkuffd_wp(entry);
>       /* copy-on-write page */

Thanks!

Reply via email to