On 3/26/25 21:10, Bert Karwatzki wrote:
> Am Mittwoch, dem 26.03.2025 um 12:50 +1100 schrieb Balbir Singh:
>> On 3/26/25 10:43, Balbir Singh wrote:
>>> On 3/26/25 10:21, Bert Karwatzki wrote:
>>>> Am Mittwoch, dem 26.03.2025 um 09:45 +1100 schrieb Balbir Singh:
>>>>>
>>>>>
>>>>> The second region seems to be additional, I suspect that is HMM mapping 
>>>>> from kgd2kfd_init_zone_device()
>>>>>
>>>>> Balbir Singh
>>>>>
>>>> Good guess! I inserted a printk into kgd2kfd_init_zone_device():
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
>>>> index d05d199b5e44..201220e2ac42 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
>>>> @@ -1049,6 +1049,8 @@ int kgd2kfd_init_zone_device(struct amdgpu_device 
>>>> *adev)
>>>>                 pgmap->range.end = res->end;
>>>>                 pgmap->type = MEMORY_DEVICE_PRIVATE;
>>>>         }
>>>> +       dev_info(adev->dev, "%s: range.start = 0x%llx ranges.end = 
>>>> 0x%llx\n",
>>>> +                       __func__, pgmap->range.start, pgmap->range.end);
>>>>
>>>>         pgmap->nr_range = 1;
>>>>         pgmap->ops = &svm_migrate_pgmap_ops;
>>>>
>>>>
>>>> and get this in the case without nokaslr:
>>>>
>>>> [    T367] amdgpu 0000:03:00.0: kfd_migrate: kgd2kfd_init_zone_device:
>>>> range.start = 0xafe00000000 ranges.end = 0xaffffffffff
>>>>
>>>> and this in the case with nokaslr:
>>>>
>>>> [    T365] amdgpu 0000:03:00.0: kfd_migrate: kgd2kfd_init_zone_device:
>>>> range.start = 0x3ffe00000000 ranges.end = 0x3fffffffffff
>>>>
>>>
>>> So we should ignore the second region then for the purposes of this issue.
>>>
>>> I think this now boils down to
>>>
>>> Why is the dma_get_required_mask set to all of addressable memory (46 bits)
>>> when we have nokaslr
>>>
>>
>> I think I know the root cause of the required_mask going up and hence the
>> use of DMA32
>>
>> 1. HMM calls add_pages()
>> 2. add_pages calls update_end_of_memory_vars()
>> 3. This updates max_pfn and that causes required_mask to go up to 46 bits
>>
>> Do you have CONFIG_HSA_AMD_SVM enabled? Does turning it off, fix the issue?
>>
>> The actual issue is the update of max_pfn.
>>
>> Balbir Singh
>>
> 
> Yes, turning off CONFIG_HSA_AMD_SVM fixes the issue, the strange memory
> resource 
> afe00000000-affffffffff : 0000:03:00.0
> is gone.
> 
> If one would add a max_pyhs_addr argument to devm_request_free_mem_region()
> (which return the resource addr in kgd2kfd_init_zone_device()) one could keep
> the memory below the 44bit limit with CONFIG_HSA_AMD_SVM enabled.
> 

Thanks for reporting the result, does this patch work

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 01ea7c6df303..14f42f8012ab 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -968,8 +968,9 @@ int add_pages(int nid, unsigned long start_pfn, unsigned 
long nr_pages,
        WARN_ON_ONCE(ret);
 
        /* update max_pfn, max_low_pfn and high_memory */
-       update_end_of_memory_vars(start_pfn << PAGE_SHIFT,
-                                 nr_pages << PAGE_SHIFT);
+       if (!params->pgmap)
+               update_end_of_memory_vars(start_pfn << PAGE_SHIFT,
+                                         nr_pages << PAGE_SHIFT);
 
        return ret;
 }

It basically prevents max_pfn from moving when the inserted memory is 
zone_device.

FYI: It's a test patch and will still create issues if the amount of present 
memory
(physically) is very high, because the driver need to enable use_dma32 in that 
case.

If you could try this with everything back to the original config with both 
kaslr/nokaslr that
would be very helpful

Thanks,
Balbir Singh

Reply via email to