Re: [PATCH v1 9/9] mm/memory: optimize unmap/zap with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
-    folio_remove_rmap_pte(folio, page, vma); +    folio_remove_rmap_ptes(folio, page, nr, vma); + +    /* Only sanity-check the first page in a batch. */   if (unlikely(page_mapcount(page) < 0))   print_bad_pte(vma, addr, ptent, page); Is there a case for eithe

Re: [PATCH v1 9/9] mm/memory: optimize unmap/zap with PTE-mapped THP

2024-01-31 Thread Yin, Fengwei
On 1/31/2024 6:30 PM, David Hildenbrand wrote: On 31.01.24 03:30, Yin Fengwei wrote: On 1/29/24 22:32, David Hildenbrand wrote: +static inline pte_t get_and_clear_full_ptes(struct mm_struct *mm, +    unsigned long addr, pte_t *ptep, unsigned int nr, int full) +{ +    pte_t pte, tmp_pte

Re: [PATCH v1 9/9] mm/memory: optimize unmap/zap with PTE-mapped THP

2024-01-31 Thread Ryan Roberts
On 31/01/2024 10:21, David Hildenbrand wrote: > >>> + >>> +#ifndef clear_full_ptes >>> +/** >>> + * clear_full_ptes - Clear PTEs that map consecutive pages of the same >>> folio. >> >> I know its implied from "pages of the same folio" (and even more so for the >> above variant due to mention of a

Re: [PATCH v1 9/9] mm/memory: optimize unmap/zap with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
On 31.01.24 03:30, Yin Fengwei wrote: On 1/29/24 22:32, David Hildenbrand wrote: +static inline pte_t get_and_clear_full_ptes(struct mm_struct *mm, + unsigned long addr, pte_t *ptep, unsigned int nr, int full) +{ + pte_t pte, tmp_pte; + + pte = ptep_get_and_clear_full

Re: [PATCH v1 9/9] mm/memory: optimize unmap/zap with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
+ +#ifndef clear_full_ptes +/** + * clear_full_ptes - Clear PTEs that map consecutive pages of the same folio. I know its implied from "pages of the same folio" (and even more so for the above variant due to mention of access/dirty), but I wonder if its useful to explicitly state that "all pt

Re: [PATCH v1 9/9] mm/memory: optimize unmap/zap with PTE-mapped THP

2024-01-30 Thread Yin Fengwei
On 1/29/24 22:32, David Hildenbrand wrote: > +static inline pte_t get_and_clear_full_ptes(struct mm_struct *mm, > + unsigned long addr, pte_t *ptep, unsigned int nr, int full) > +{ > + pte_t pte, tmp_pte; > + > + pte = ptep_get_and_clear_full(mm, addr, ptep, full); > + wh

Re: [PATCH v1 9/9] mm/memory: optimize unmap/zap with PTE-mapped THP

2024-01-30 Thread Ryan Roberts
On 29/01/2024 14:32, David Hildenbrand wrote: > Similar to how we optimized fork(), let's implement PTE batching when > consecutive (present) PTEs map consecutive pages of the same large > folio. > > Most infrastructure we need for batching (mmu gather, rmap) is already > there. We only have to ad

Re: [PATCH v1 9/9] mm/memory: optimize unmap/zap with PTE-mapped THP

2024-01-30 Thread David Hildenbrand
Re-reading the docs myself: +#ifndef get_and_clear_full_ptes +/** + * get_and_clear_full_ptes - Clear PTEs that map consecutive pages of the same + * folio, collecting dirty/accessed bits. + * @mm: Address space the pages are mapped into. + * @addr: Address the first pag

[PATCH v1 9/9] mm/memory: optimize unmap/zap with PTE-mapped THP

2024-01-29 Thread David Hildenbrand
Similar to how we optimized fork(), let's implement PTE batching when consecutive (present) PTEs map consecutive pages of the same large folio. Most infrastructure we need for batching (mmu gather, rmap) is already there. We only have to add get_and_clear_full_ptes() and clear_full_ptes(). Similar