On 16.05.19 13:16, Michal Hocko wrote: > On Thu 16-05-19 16:36:12, Anshuman Khandual wrote: >> On 05/16/2019 03:53 PM, Mark Rutland wrote: >>> Hi Michal, >>> >>> On Wed, May 15, 2019 at 06:58:47PM +0200, Michal Hocko wrote: >>>> On Tue 14-05-19 14:30:05, Anshuman Khandual wrote: >>>>> The arm64 pagetable dump code can race with concurrent modification of the >>>>> kernel page tables. When a leaf entries are modified concurrently, the >>>>> dump >>>>> code may log stale or inconsistent information for a VA range, but this is >>>>> otherwise not harmful. >>>>> >>>>> When intermediate levels of table are freed, the dump code will continue >>>>> to >>>>> use memory which has been freed and potentially reallocated for another >>>>> purpose. In such cases, the dump code may dereference bogus addressses, >>>>> leading to a number of potential problems. >>>>> >>>>> Intermediate levels of table may by freed during memory hot-remove, or >>>>> when >>>>> installing a huge mapping in the vmalloc region. To avoid racing with >>>>> these >>>>> cases, take the memory hotplug lock when walking the kernel page table. >>>> >>>> Why is this a problem only on arm64 >>> >>> It looks like it's not -- I think we're just the first to realise this. >>> >>> AFAICT x86's debugfs ptdump has the same issue if run conccurently with >>> memory hot remove. If 32-bit arm supported hot-remove, its ptdump code >>> would have the same issue. >>> >>>> and why do we even care for debugfs? Does anybody rely on this thing >>>> to be reliable? Do we even need it? Who is using the file? >>> >>> The debugfs part is used intermittently by a few people working on the >>> arm64 kernel page tables. We use that both to sanity-check that kernel >>> page tables are created/updated correctly after changes to the arm64 mmu >>> code, and also to debug issues if/when we encounter issues that appear >>> to be the result of kernel page table corruption. >>> >>> So while it's rare to need it, it's really useful to have when we do >>> need it, and I'd rather not remove it. I'd also rather that it didn't >>> have latent issues where we can accidentally crash the kernel when using >>> it, which is what this patch is addressing. >>> >>>> I am asking because I would really love to make mem hotplug locking less >>>> scattered outside of the core MM than more. Most users simply shouldn't >>>> care. Pfn walkers should rely on pfn_to_online_page. >>> >>> I'm not sure if that would help us here; IIUC pfn_to_online_page() alone >>> doesn't ensure that the page remains online. Is there a way to achieve >>> that other than get_online_mems()? >> >> Still wondering how pfn_to_online_page() is applicable here. It validates >> a given PFN and whether its online from sparse section mapping perspective >> before giving it's struct page. IIUC it is used during a linear scanning >> of a physical address range not for a page table walk. So how it can solve >> the problem when a struct page which was used as an intermediate level page >> table page gets released back to the buddy from another concurrent thread ? > > Well, my comment about pfn_to_online_page was more generic and it might > not apply to this specific case. I meant to say that the code outside of > the core MM shouldn't really care about the hotplug locking. >
What am I missing, how is it guaranteed that a page doesn't get offlined/removed without holding a lock here? We would at least need some RCU mechnism or similar to sync against pages vanishing. pfn_to_online_page() assumes that somebody touches a page he doesn't own. There has to be some way for core-mm to realize this and defer offlining/removinf. -- Thanks, David / dhildenb