From: "Aneesh Kumar K.V" <aneesh.ku...@linux.vnet.ibm.com> When we collapse normal pages to hugepage, we first clear the pmd, then invalidate all the PTE entries. The assumption here is that any low level page fault will see pmd as none and take the slow path that will wait on mmap_sem. But we could very well be in a hash_page with local ptep pointer value. Such a hash page can result in adding new HPTE entries for normal subpages/small page. That means we could be modifying the page content as we copy them to a huge page. Fix this by waiting on hash_page to finish after marking the pmd none and bfore invalidating HPTE entries. We use the heavy kick_all_cpus_sync(). This should be ok as we do this in the background khugepaged thread and not in application context. But we block page fault handling for this time. Also if we find collapse slow we can ideally increase the scan rate.
Signed-off-by: Aneesh Kumar K.V <aneesh.ku...@linux.vnet.ibm.com> --- arch/powerpc/mm/pgtable_64.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c index bbecac4..4bb44c3 100644 --- a/arch/powerpc/mm/pgtable_64.c +++ b/arch/powerpc/mm/pgtable_64.c @@ -543,6 +543,14 @@ pmd_t pmdp_clear_flush(struct vm_area_struct *vma, unsigned long address, pmd = *pmdp; pmd_clear(pmdp); /* + * Wait for all pending hash_page to finish + * We can do this by waiting for a context switch to happen on + * the cpus. Any new hash_page after this will see pmd none + * and fallback to code that takes mmap_sem and hence will block + * for collapse to finish. + */ + kick_all_cpus_sync(); + /* * Now invalidate the hpte entries in the range * covered by pmd. This make sure we take a * fault and will find the pmd as none, which will -- 1.8.1.2 _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev