"Kirill A. Shutemov" <kirill.shute...@linux.intel.com> writes:
> On Fri, Feb 05, 2016 at 11:41:40PM +0530, Aneesh Kumar K.V wrote: >> With ppc64 we use the deposted pgtable_t to store the hash pte slot >> information. We should not withdraw the deposited pgtable_t without >> marking the pmd none. This ensure that low level hash fault handling >> will skip this huge pte and we will handle them at upper levels. We >> do take page table lock there and we can serialize against a parallel >> THP split there. Hence mark the pte none (ie, remove __PAGE_USER) before >> splitting the huge pmd. >> >> Also make sure we wait for irq disable section in other cpus to finish >> before flipping a huge pte entry with a regular pmd entry. Code paths >> like find_linux_pte_or_hugepte depend on irq disable to get >> a stable pte_t pointer. A parallel thp split need to make sure we >> don't convert a pmd pte to a regular pmd entry without waiting for the >> irq disable section to finish. >> >> Signed-off-by: Aneesh Kumar K.V <aneesh.ku...@linux.vnet.ibm.com> > > Cc list is too short. At least akpm@ and linux-mm@ should be there. > Probably numa balancing folks. Will add them in the next iteration. > > Have you tested it with CONFIG_NUMA_BALANCING disabled? yes. > > I would expect some additional changes in this area would be required. > pmd_protnone() is always zero without numa balancing compiled in and > therefore I don't see where we will get this serialization agians ptl on > fault side. I am not really depending on the pmd_protnone definition here. The thing that I am depending with respect to the core code is that after taking ptl, all the code path should check for pmd using pmd_same. If found not matching they should force a retry. All code path within pmd_trans_huge() check seem to do so. > >> --- >> arch/powerpc/include/asm/book3s/64/pgtable.h | 4 ++++ >> arch/powerpc/mm/pgtable_64.c | 36 >> +++++++++++++++++++++++++++- >> include/asm-generic/pgtable.h | 8 +++++++ >> mm/huge_memory.c | 1 + >> 4 files changed, 48 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h >> b/arch/powerpc/include/asm/book3s/64/pgtable.h >> index 8d1c41d28318..0415856941e0 100644 >> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h >> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h >> @@ -281,6 +281,10 @@ extern pgtable_t pgtable_trans_huge_withdraw(struct >> mm_struct *mm, pmd_t *pmdp); >> extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long >> address, >> pmd_t *pmdp); >> >> +#define __HAVE_ARCH_PMDP_HUGE_SPLITTING_FLUSH >> +extern void pmdp_huge_splitting_flush(struct vm_area_struct *vma, >> + unsigned long address, pmd_t *pmdp); > > I don't really like the name, but cannot think of anything better. same here. I will keep this as it is for now. ? > >> + >> #define pmd_move_must_withdraw pmd_move_must_withdraw >> struct spinlock; >> static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl, >> diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c >> index 3124a20d0fab..d80a23a92f95 100644 >> --- a/arch/powerpc/mm/pgtable_64.c >> +++ b/arch/powerpc/mm/pgtable_64.c >> @@ -646,6 +646,31 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct >> *mm, pmd_t *pmdp) >> return pgtable; >> } -aneesh _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev