On 10/3/19 5:21 PM, Peter Zijlstra wrote:
On Thu, Oct 03, 2019 at 09:11:45AM +0200, Peter Zijlstra wrote:
On Wed, Oct 02, 2019 at 10:33:15PM -0300, Leonardo Bras wrote:


....



And I still think all that wrong, you really shouldn't need to wait on
munmap().


I do have a patch that does something like that.


+#define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR_FULL
+static inline pmd_t pmdp_huge_get_and_clear_full(struct mm_struct *mm,
+                                                unsigned long address, pmd_t 
*pmdp,
+                                                int full)
+{
+       bool serialize = true;
+       /*
+        * We don't need to serialze against a lockless page table walk if
+        * we are clearing the pmd due to task exit. For regular mnumap, we
+        * still need to serialize due the possibility of MADV_DONTNEED running
+        * parallel to a page fault which can convert a THP pte entry to a
+        * pointer to level 4 table.
+        * Here MADV_DONTNEED is removing the THP entry and the fault is filling
+        * a level 4 pte.
+        */
+       if (full == 1)
+               serialize = false;
+       return __pmdp_huge_get_and_clear(mm, address, pmdp, serialize);
 }


if it is a fullmm flush we can skip that serialize, But for everything else we need to serialize. MADV_DONTNEED is another case. I haven't sent this yet, because I was trying to look at what it takes to switch that MADV variant to take mmap_sem in write mode.

MADV_DONTNEED has caused us multiple issues due to the fact that it can run in parallel to page fault. I am not sure whether we have a known/noticeable performance gain in allowing that with mmap_sem held in read mode.




-aneesh

Reply via email to