Prepare for lockless PGD init: enable the arch_pgd_init_late() callback and add a 'careful' implementation of PGD init to it: only copy over non-zero entries.
Since PGD entries only ever get added, this method catches any updates to swapper_pg_dir[] that might have occured between early PGD init and late PGD init. Note that this only matters for code that does not use the pgd_list but the task list to find all PGDs in the system. Subsequent patches will convert pgd_list users to task-list iterations. [ This adds extra overhead in that we do the PGD initialization for a second time - a later patch will simplify this, once we don't have old pgd_list users. ] Cc: Andrew Morton <a...@linux-foundation.org> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Borislav Petkov <b...@alien8.de> Cc: Brian Gerst <brge...@gmail.com> Cc: Denys Vlasenko <dvlas...@redhat.com> Cc: H. Peter Anvin <h...@zytor.com> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Signed-off-by: Ingo Molnar <mi...@kernel.org> --- arch/x86/Kconfig | 1 + arch/x86/mm/pgtable.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 7e39f9b22705..15c19ce149f0 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -27,6 +27,7 @@ config X86 select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FAST_MULTIPLIER select ARCH_HAS_GCOV_PROFILE_ALL + select ARCH_HAS_PGD_INIT_LATE select ARCH_HAS_SG_CHAIN select ARCH_HAVE_NMI_SAFE_CMPXCHG select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index fb0a9dd1d6e4..e0bf90470d70 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -391,6 +391,63 @@ pgd_t *pgd_alloc(struct mm_struct *mm) return NULL; } +/* + * Initialize the kernel portion of the PGD. + * + * This is done separately, because pgd_alloc() happens when + * the task is not on the task list yet - and PGD updates + * happen by walking the task list. + * + * No locking is needed here, as we just copy over the reference + * PGD. The reference PGD (pgtable_init) is only ever expanded + * at the highest, PGD level. Thus any other task extending it + * will first update the reference PGD, then modify the task PGDs. + */ +void arch_pgd_init_late(struct mm_struct *mm, pgd_t *pgd) +{ + /* + * This is called after a new MM has been made visible + * in fork() or exec(). + * + * This barrier makes sure the MM is visible to new RCU + * walkers before we initialize it, so that we don't miss + * updates: + */ + smp_wmb(); + + /* + * If the pgd points to a shared pagetable level (either the + * ptes in non-PAE, or shared PMD in PAE), then just copy the + * references from swapper_pg_dir: + */ + if (CONFIG_PGTABLE_LEVELS == 2 || + (CONFIG_PGTABLE_LEVELS == 3 && SHARED_KERNEL_PMD) || + CONFIG_PGTABLE_LEVELS == 4) { + + pgd_t *pgd_src = swapper_pg_dir + KERNEL_PGD_BOUNDARY; + pgd_t *pgd_dst = pgd + KERNEL_PGD_BOUNDARY; + int i; + + for (i = 0; i < KERNEL_PGD_PTRS; i++, pgd_src++, pgd_dst++) { + /* + * This is lock-less, so it can race with PGD updates + * coming from vmalloc() or CPA methods, but it's safe, + * because: + * + * 1) this PGD is not in use yet, we have still not + * scheduled this task. + * 2) we only ever extend PGD entries + * + * So if we observe a non-zero PGD entry we can copy it, + * it won't change from under us. Parallel updates (new + * allocations) will modify our (already visible) PGD: + */ + if (pgd_val(*pgd_src)) + WRITE_ONCE(*pgd_dst, *pgd_src); + } + } +} + void pgd_free(struct mm_struct *mm, pgd_t *pgd) { pgd_mop_up_pmds(mm, pgd); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/