On 3/27/25 09:58, Linus Torvalds wrote: > On Wed, 26 Mar 2025 at 15:00, Bert Karwatzki <spassw...@web.de> wrote: >> >> As Balbir Singh found out this memory comes from amdkfd >> (kgd2kfd_init_zone_device()) with CONFIG_HSA_AMD_SVM=y. The memory gets >> placed >> by devm_request_free_mem_region() which places the memory at the end of the >> physical address space (DIRECT_MAP_PHYSMEM_END). DIRECT_MAP_PHYSMEM_END >> changes >> when using nokaslr and so the memory shifts. > > So I just want to say that having followed the thread as a spectator, > big kudos to everybody involved in this thing. Particularly to you, > Bart, for all your debugging and testing, and to Balbir for following > up and figuring it out. > > Because this was a strange one. >
Thanks! >> One can work around this by removing the GFR_DESCENDING flag from >> devm_request_free_mem_region() so the memory gets placed right after the >> other >> resources: > > I worry that there might be other machines where that completely breaks > things. > > There are various historical reasons why we look for addresses in high > regions, ie on machines where there are various hidden IO regions that > aren't enumerated by e280 and aren't found by our usual PCI BAR > discovery because they are special hidden ones. > > So then users of [devm_]request_free_mem_region() might end up getting > allocated a region that has some magic system resource in it. > > And no, this shouldn't happen on any normal machine, but it has > definitely been a thing in the past. > > So I'm very happy that you guys figured out what ended up happening, > but I'm not convinced that the devm_request_free_mem_region() > workaround is tenable. > > So I think it needs to be more targeted to the HSA_AMD_SVM case than > touch the devm_request_free_mem_region() logic for everybody. > I agree with your assessment, I was looking at whether bumping up max_pfn for DEVICE_PRIVATE memory mappings via add_pages() is the right thing to do, but I have not yet completed my code search. >From my understanding, max_pfn should be used as the end of system RAM and direct_map_physmem_end as end of addressable memory. I proposed not updating max_pfn for zone device based add_pages() on x86 via a test patch that worked for Bert. This allows HSA_AMD_SVM, nokaslr, PCI_P2PDMA to all co-exist, but I need to audit all of the max_pfn usage and assumptions. Balbir Singh