On Fri, Nov 21, 2014 at 9:22 AM, Andy Lutomirski <l...@amacapital.net> wrote: > > Both mystify me. Why does the 32-bit version walk down the hierarchy > at all instead of just touching the top level?
Quite frankly, I think it's just due to historical reasons, and should be removed. But the historical reasons are that with the aliasing of the PUD and PMD entries in the PGD, it's all fairly confusing. So I think we only used to do the top level, but then when we expanded from two levels to three, that "top level" became the pmd, and then when we expanded from three to four, the pmd was actually two levels down. So it's all basically mindless work. So I do think we could simplify and unify things. In 32-bit mode, we actually have two different cases: - in PAE, there's the magic top-level 4-entry PGD that always *has* to be present (the P bit isn't actually checked by hardware) As a result, in PAE mode, the top PGD entries always exist, and are always prepopulated, and for the kernel area (including obviously the vmalloc space) always points to the init_pgd[] entry. Ergo, in PAE mode, I don't think we should ever hit this case in the first place. - in non-PAE mode, we should just copy the top-level entry, and return. And in 64-bit more, we only have the "copy the top-level entry" case. So I think we should (a) remove the 32-bit vs 64-bit difference, because that's not actually valid (b) make it a PAE vs non-PAE difference (c) the PAE case is a no-op (d) the non-PAE case would look something like this: static noinline int vmalloc_fault(unsigned long address) { unsigned index; pgd_t *pgd_dst, pgd_entry; /* Make sure we are in vmalloc area: */ if (!(address >= VMALLOC_START && address < VMALLOC_END)) return -1; index = pgd_index(address); pgd_entry = init_mm.pgd[index]; if (!pgd_present(pgd_entry)) return -1; pgd_dst = __va(PAGE_MASK & read_cr3()); if (pgd_present(pgd_dst[index])) return -1; ACCESS_ONCE(pgd_dst[index]) = pgd_entry; return 0; } NOKPROBE_SYMBOL(vmalloc_fault); and it's done. Would anybody be willing to actually *test* something like the above? The above may compile, but that's all the "testing" it got. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/