Am Mittwoch, dem 26.03.2025 um 21:36 +1100 schrieb Balbir Singh: > On 3/26/25 21:10, Bert Karwatzki wrote: > > Am Mittwoch, dem 26.03.2025 um 12:50 +1100 schrieb Balbir Singh: > > > On 3/26/25 10:43, Balbir Singh wrote: > > > > On 3/26/25 10:21, Bert Karwatzki wrote: > > > > > Am Mittwoch, dem 26.03.2025 um 09:45 +1100 schrieb Balbir Singh: > > > > > > > > > > > > > > > > > > The second region seems to be additional, I suspect that is HMM > > > > > > mapping from kgd2kfd_init_zone_device() > > > > > > > > > > > > Balbir Singh > > > > > > > > > > > Good guess! I inserted a printk into kgd2kfd_init_zone_device(): > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > > > b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > > > index d05d199b5e44..201220e2ac42 100644 > > > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > > > @@ -1049,6 +1049,8 @@ int kgd2kfd_init_zone_device(struct > > > > > amdgpu_device *adev) > > > > > pgmap->range.end = res->end; > > > > > pgmap->type = MEMORY_DEVICE_PRIVATE; > > > > > } > > > > > + dev_info(adev->dev, "%s: range.start = 0x%llx ranges.end = > > > > > 0x%llx\n", > > > > > + __func__, pgmap->range.start, > > > > > pgmap->range.end); > > > > > > > > > > pgmap->nr_range = 1; > > > > > pgmap->ops = &svm_migrate_pgmap_ops; > > > > > > > > > > > > > > > and get this in the case without nokaslr: > > > > > > > > > > [ T367] amdgpu 0000:03:00.0: kfd_migrate: kgd2kfd_init_zone_device: > > > > > range.start = 0xafe00000000 ranges.end = 0xaffffffffff > > > > > > > > > > and this in the case with nokaslr: > > > > > > > > > > [ T365] amdgpu 0000:03:00.0: kfd_migrate: kgd2kfd_init_zone_device: > > > > > range.start = 0x3ffe00000000 ranges.end = 0x3fffffffffff > > > > > > > > > > > > > So we should ignore the second region then for the purposes of this > > > > issue. > > > > > > > > I think this now boils down to > > > > > > > > Why is the dma_get_required_mask set to all of addressable memory (46 > > > > bits) > > > > when we have nokaslr > > > > > > > > > > I think I know the root cause of the required_mask going up and hence the > > > use of DMA32 > > > > > > 1. HMM calls add_pages() > > > 2. add_pages calls update_end_of_memory_vars() > > > 3. This updates max_pfn and that causes required_mask to go up to 46 bits > > > > > > Do you have CONFIG_HSA_AMD_SVM enabled? Does turning it off, fix the > > > issue? > > > > > > The actual issue is the update of max_pfn. > > > > > > Balbir Singh > > > > > > > Yes, turning off CONFIG_HSA_AMD_SVM fixes the issue, the strange memory > > resource > > afe00000000-affffffffff : 0000:03:00.0 > > is gone. > > > > If one would add a max_pyhs_addr argument to devm_request_free_mem_region() > > (which return the resource addr in kgd2kfd_init_zone_device()) one could > > keep > > the memory below the 44bit limit with CONFIG_HSA_AMD_SVM enabled. > > > > Thanks for reporting the result, does this patch work > > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c > index 01ea7c6df303..14f42f8012ab 100644 > --- a/arch/x86/mm/init_64.c > +++ b/arch/x86/mm/init_64.c > @@ -968,8 +968,9 @@ int add_pages(int nid, unsigned long start_pfn, unsigned > long nr_pages, > WARN_ON_ONCE(ret); > > /* update max_pfn, max_low_pfn and high_memory */ > - update_end_of_memory_vars(start_pfn << PAGE_SHIFT, > - nr_pages << PAGE_SHIFT); > + if (!params->pgmap) > + update_end_of_memory_vars(start_pfn << PAGE_SHIFT, > + nr_pages << PAGE_SHIFT); > > return ret; > } > > It basically prevents max_pfn from moving when the inserted memory is > zone_device. > > FYI: It's a test patch and will still create issues if the amount of present > memory > (physically) is very high, because the driver need to enable use_dma32 in > that case. > > If you could try this with everything back to the original config with both > kaslr/nokaslr that > would be very helpful > > Thanks, > Balbir Singh
Yes, this fixes the issue with stellaris and Civilization6. The memory still shifts as usual in /proc/iomem: afe00000000-affffffffff : 0000:03:00.0 without nokaslr 3ffe00000000-3fffffffffff : 0000:03:00.0 with nokaslr but without the change in max_pfn the this has no impact on the required dma mask. Bert Karwatzki