On 3/24/25 23:14, Christian König wrote:
> Am 24.03.25 um 12:23 schrieb Bert Karwatzki:
>> Am Sonntag, dem 23.03.2025 um 17:51 +1100 schrieb Balbir Singh:
>>> On 3/22/25 23:23, Bert Karwatzki wrote:
...
So why is use_dma32 enabled with nokaslr? Some more printk()s give this
result:
Am 24.03.25 um 12:23 schrieb Bert Karwatzki:
> Am Sonntag, dem 23.03.2025 um 17:51 +1100 schrieb Balbir Singh:
>> On 3/22/25 23:23, Bert Karwatzki wrote:
>>> ...
>>> So why is use_dma32 enabled with nokaslr? Some more printk()s give this
>>> result:
>>>
>>> The GPUs:
>>> built-in:
>>> 08:00.0 VGA
On 3/24/25 22:23, Bert Karwatzki wrote:
> Am Sonntag, dem 23.03.2025 um 17:51 +1100 schrieb Balbir Singh:
>> On 3/22/25 23:23, Bert Karwatzki wrote:
>>> The problem occurs in this part of ttm_tt_populate(), in the nokaslr case
>>> the loop is entered and repeatedly run because ttm_dma32_pages alloc
Am Sonntag, dem 23.03.2025 um 17:51 +1100 schrieb Balbir Singh:
> On 3/22/25 23:23, Bert Karwatzki wrote:
> > The problem occurs in this part of ttm_tt_populate(), in the nokaslr case
> > the loop is entered and repeatedly run because ttm_dma32_pages allocated
> > exceeds
> > the ttm_dma32_pages_l
On 3/27/25 21:53, Ingo Molnar wrote:
>
> * Balbir Singh wrote:
>
>>> Yes, turning off CONFIG_HSA_AMD_SVM fixes the issue, the strange memory
>>> resource
>>> afe-aff : :03:00.0
>>> is gone.
>>>
>>> If one would add a max_pyhs_addr argument to devm_request_free_mem_region()
>
Am Mittwoch, dem 26.03.2025 um 15:58 -0700 schrieb Linus Torvalds:
> On Wed, 26 Mar 2025 at 15:00, Bert Karwatzki wrote:
> >
> > As Balbir Singh found out this memory comes from amdkfd
> > (kgd2kfd_init_zone_device()) with CONFIG_HSA_AMD_SVM=y. The memory gets
> > placed
> > by devm_request_free_
* Balbir Singh wrote:
> > Yes, turning off CONFIG_HSA_AMD_SVM fixes the issue, the strange memory
> > resource
> > afe-aff : :03:00.0
> > is gone.
> >
> > If one would add a max_pyhs_addr argument to devm_request_free_mem_region()
> > (which return the resource addr in kgd
* Linus Torvalds wrote:
> On Wed, 26 Mar 2025 at 15:00, Bert Karwatzki wrote:
> >
> > As Balbir Singh found out this memory comes from amdkfd
> > (kgd2kfd_init_zone_device()) with CONFIG_HSA_AMD_SVM=y. The memory gets
> > placed
> > by devm_request_free_mem_region() which places the memory at
Am Dienstag, dem 25.03.2025 um 13:23 +0100 schrieb Christian König:
> Am 25.03.25 um 11:14 schrieb Bert Karwatzki:
> > My /proc/iomem contans two memory areas of 8G size which are
> > belonging to PCI :03:00.0, one of the is the BAR reported by dmesg
> > [ 0.312692] [ T1] pci :03:00.0: BAR
On Wed, 26 Mar 2025 at 15:00, Bert Karwatzki wrote:
>
> As Balbir Singh found out this memory comes from amdkfd
> (kgd2kfd_init_zone_device()) with CONFIG_HSA_AMD_SVM=y. The memory gets placed
> by devm_request_free_mem_region() which places the memory at the end of the
> physical address space (D
On 3/27/25 09:58, Linus Torvalds wrote:
> On Wed, 26 Mar 2025 at 15:00, Bert Karwatzki wrote:
>>
>> As Balbir Singh found out this memory comes from amdkfd
>> (kgd2kfd_init_zone_device()) with CONFIG_HSA_AMD_SVM=y. The memory gets
>> placed
>> by devm_request_free_mem_region() which places the me
Am Mittwoch, dem 26.03.2025 um 12:50 +1100 schrieb Balbir Singh:
> On 3/26/25 10:43, Balbir Singh wrote:
> > On 3/26/25 10:21, Bert Karwatzki wrote:
> > > Am Mittwoch, dem 26.03.2025 um 09:45 +1100 schrieb Balbir Singh:
> > > >
> > > >
> > > > The second region seems to be additional, I suspect tha
Am Mittwoch, dem 26.03.2025 um 21:36 +1100 schrieb Balbir Singh:
> On 3/26/25 21:10, Bert Karwatzki wrote:
> > Am Mittwoch, dem 26.03.2025 um 12:50 +1100 schrieb Balbir Singh:
> > > On 3/26/25 10:43, Balbir Singh wrote:
> > > > On 3/26/25 10:21, Bert Karwatzki wrote:
> > > > > Am Mittwoch, dem 26.0
On 3/26/25 21:10, Bert Karwatzki wrote:
> Am Mittwoch, dem 26.03.2025 um 12:50 +1100 schrieb Balbir Singh:
>> On 3/26/25 10:43, Balbir Singh wrote:
>>> On 3/26/25 10:21, Bert Karwatzki wrote:
Am Mittwoch, dem 26.03.2025 um 09:45 +1100 schrieb Balbir Singh:
>
>
> The second region s
Am Mittwoch, dem 26.03.2025 um 09:45 +1100 schrieb Balbir Singh:
>
>
> The second region seems to be additional, I suspect that is HMM mapping from
> kgd2kfd_init_zone_device()
>
> Balbir Singh
>
Good guess! I inserted a printk into kgd2kfd_init_zone_device():
diff --git a/drivers/gpu/drm/amd/amd
On 3/26/25 10:43, Balbir Singh wrote:
> On 3/26/25 10:21, Bert Karwatzki wrote:
>> Am Mittwoch, dem 26.03.2025 um 09:45 +1100 schrieb Balbir Singh:
>>>
>>>
>>> The second region seems to be additional, I suspect that is HMM mapping
>>> from kgd2kfd_init_zone_device()
>>>
>>> Balbir Singh
>>>
>> Go
On 3/26/25 10:21, Bert Karwatzki wrote:
> Am Mittwoch, dem 26.03.2025 um 09:45 +1100 schrieb Balbir Singh:
>>
>>
>> The second region seems to be additional, I suspect that is HMM mapping from
>> kgd2kfd_init_zone_device()
>>
>> Balbir Singh
>>
> Good guess! I inserted a printk into kgd2kfd_init_z
On 3/25/25 18:35, Christian König wrote:
> Am 24.03.25 um 23:48 schrieb Balbir Singh:
lspci -v reports 8G of memory at 0xfc so I assmumed that is the
GPU RAM.
03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23
[Radeon RX 6600/6600 XT/6600M] (rev
My /proc/iomem contans two memory areas of 8G size which are
belonging to PCI :03:00.0, one of the is the BAR reported by dmesg
[ 0.312692] [ T1] pci :03:00.0: BAR 0 [mem 0xfc-0xfd 64bit
pref]
the other one is "afe-aff : :03:00.0" (in the case without
I did some monitoring using this patch (on top of 6.12.18):
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index 0760e70402ec..ccd0c9058cee 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_m
Am 25.03.25 um 11:14 schrieb Bert Karwatzki:
> My /proc/iomem contans two memory areas of 8G size which are
> belonging to PCI :03:00.0, one of the is the BAR reported by dmesg
> [ 0.312692] [ T1] pci :03:00.0: BAR 0 [mem 0xfc-0xfd
> 64bit pref]
> the other one is "afe000
On 3/25/25 19:36, Christian König wrote:
> Am 25.03.25 um 00:07 schrieb Bert Karwatzki:
>> Here's the dmesg from linux-next-6.14-rc7-next20250321 (CONFIG_PCI_P2PDMA
>> not set)
>> The memory ranges of (afe-aff) or
>> (3ffe-3fff) are
>> mentioned in neither of them.
Am 25.03.25 um 00:07 schrieb Bert Karwatzki:
> Here's the dmesg from linux-next-6.14-rc7-next20250321 (CONFIG_PCI_P2PDMA not
> set)
> The memory ranges of (afe-aff) or (3ffe-3fff)
> are
> mentioned in neither of them.
Ugh, next time either in two mails or as attac
Am 24.03.25 um 23:48 schrieb Balbir Singh:
>>> lspci -v reports 8G of memory at 0xfc so I assmumed that is the GPU
>>> RAM.
>>> 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23
>>> [Radeon RX 6600/6600 XT/6600M] (rev c3)
>>> Subsystem: Micro-Star International
The problem occurs in this part of ttm_tt_populate(), in the nokaslr case
the loop is entered and repeatedly run because ttm_dma32_pages allocated exceeds
the ttm_dma32_pages_limit which leads to lots of calls to ttm_global_swapout().
if (!strcmp(get_current()->comm, "stellaris"))
printk(K
On 3/22/25 23:23, Bert Karwatzki wrote:
> The problem occurs in this part of ttm_tt_populate(), in the nokaslr case
> the loop is entered and repeatedly run because ttm_dma32_pages allocated
> exceeds
> the ttm_dma32_pages_limit which leads to lots of calls to
> ttm_global_swapout().
>
> if (!st
On 3/22/25 19:04, Ingo Molnar wrote:
>
> * Balbir Singh wrote:
>
>>> How frequently does this happen and what is the impact to users if
>>> this happens?
>>
>> It happens a 100% of the time when the BAR space lies beyond the
>> 10TiB region.
>
> And how frequently is the BAR space beyond the
* Balbir Singh wrote:
> > How frequently does this happen and what is the impact to users if
> > this happens?
>
> It happens a 100% of the time when the BAR space lies beyond the
> 10TiB region.
And how frequently is the BAR space beyond the 10TiB region, on modern
systems?
Thanks,
On 3/21/25 23:26, Bert Karwatzki wrote:
>>>
>>
>> I am not an expert in amdgpu or gtt_mgr, but I wonder if some of the deletes
>> are coming
>> from forceful eviction of memory during allocation?
>>
>> Have you filed a bug report for the nokaslr case?
>>
>> Balbir Singh
>
> I did some more monit
> >
>
> I am not an expert in amdgpu or gtt_mgr, but I wonder if some of the deletes
> are coming
> from forceful eviction of memory during allocation?
>
> Have you filed a bug report for the nokaslr case?
>
> Balbir Singh
I did some more monitoring and in the BAD case ttm_global_swapout() is ca
On 3/21/25 21:24, Ingo Molnar wrote:
>
> * Balbir Singh wrote:
>
>> On 3/20/25 20:01, Ingo Molnar wrote:
>>>
>>> * Balbir Singh wrote:
>>>
On 3/17/25 00:09, Bert Karwatzki wrote:
> This is related to the admgpu.gttsize. My laptop has the maximum amount
> of memory (64G) and usuall
* Balbir Singh wrote:
> On 3/20/25 20:01, Ingo Molnar wrote:
> >
> > * Balbir Singh wrote:
> >
> >> On 3/17/25 00:09, Bert Karwatzki wrote:
> >>> This is related to the admgpu.gttsize. My laptop has the maximum amount
> >>> of memory (64G) and usually gttsize is half of main memory size. I
On 3/21/25 10:43, Bert Karwatzki wrote:
> I did some monitoring using this patch (on top of 6.12.18):
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> index 0760e70402ec..ccd0c9058cee 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt
On 3/20/25 20:01, Ingo Molnar wrote:
>
> * Balbir Singh wrote:
>
>> On 3/17/25 00:09, Bert Karwatzki wrote:
>>> This is related to the admgpu.gttsize. My laptop has the maximum amount
>>> of memory (64G) and usually gttsize is half of main memory size. I just
>>> tested with cmdline="nokaslr a
* Balbir Singh wrote:
> On 3/17/25 00:09, Bert Karwatzki wrote:
> > This is related to the admgpu.gttsize. My laptop has the maximum amount
> > of memory (64G) and usually gttsize is half of main memory size. I just
> > tested with cmdline="nokaslr amdgpi.gttsize=2048" and the problem does
>
Am Sonntag, dem 16.03.2025 um 14:09 +0100 schrieb Bert Karwatzki:
> This is related to the admgpu.gttsize. My laptop has the maximum amount
> of memory (64G) and usually gttsize is half of main memory size. I just
> tested with cmdline="nokaslr amdgpi.gttsize=2048" and the problem does
> not occ
This is related to the admgpu.gttsize. My laptop has the maximum amount
of memory (64G) and usually gttsize is half of main memory size. I just
tested with cmdline="nokaslr amdgpi.gttsize=2048" and the problem does
not occur. So I did some more testing with varying gttsize and got this
for the b
On 3/17/25 00:09, Bert Karwatzki wrote:
> This is related to the admgpu.gttsize. My laptop has the maximum amount
> of memory (64G) and usually gttsize is half of main memory size. I just
> tested with cmdline="nokaslr amdgpi.gttsize=2048" and the problem does
> not occur. So I did some more tes
On Fri, Mar 14, 2025 at 8:42 PM Balbir Singh wrote:
>
> On 3/15/25 01:18, Bert Karwatzki wrote:
> > Am Samstag, dem 15.03.2025 um 00:34 +1100 schrieb Balbir Singh:
> >> On 3/14/25 17:14, Balbir Singh wrote:
> >>> On 3/14/25 09:22, Bert Karwatzki wrote:
> Am Freitag, dem 14.03.2025 um 08:54 +1
Am Samstag, dem 15.03.2025 um 00:34 +1100 schrieb Balbir Singh:
> On 3/14/25 17:14, Balbir Singh wrote:
> > On 3/14/25 09:22, Bert Karwatzki wrote:
> > > Am Freitag, dem 14.03.2025 um 08:54 +1100 schrieb Balbir Singh:
> > > > On 3/14/25 05:12, Bert Karwatzki wrote:
> > > > > Am Donnerstag, dem 13.0
On 3/14/25 17:14, Balbir Singh wrote:
> On 3/14/25 09:22, Bert Karwatzki wrote:
>> Am Freitag, dem 14.03.2025 um 08:54 +1100 schrieb Balbir Singh:
>>> On 3/14/25 05:12, Bert Karwatzki wrote:
Am Donnerstag, dem 13.03.2025 um 22:47 +1100 schrieb Balbir Singh:
>
>
> Anyway, I think th
On 3/15/25 01:18, Bert Karwatzki wrote:
> Am Samstag, dem 15.03.2025 um 00:34 +1100 schrieb Balbir Singh:
>> On 3/14/25 17:14, Balbir Singh wrote:
>>> On 3/14/25 09:22, Bert Karwatzki wrote:
Am Freitag, dem 14.03.2025 um 08:54 +1100 schrieb Balbir Singh:
> On 3/14/25 05:12, Bert Karwatzki
Am Freitag, dem 14.03.2025 um 08:54 +1100 schrieb Balbir Singh:
> On 3/14/25 05:12, Bert Karwatzki wrote:
> > Am Donnerstag, dem 13.03.2025 um 22:47 +1100 schrieb Balbir Singh:
> > >
> > >
> > > Anyway, I think the nokaslr result is interesting, it seems like with
> > > nokaslr
> > > even the olde
On 3/14/25 09:22, Bert Karwatzki wrote:
> Am Freitag, dem 14.03.2025 um 08:54 +1100 schrieb Balbir Singh:
>> On 3/14/25 05:12, Bert Karwatzki wrote:
>>> Am Donnerstag, dem 13.03.2025 um 22:47 +1100 schrieb Balbir Singh:
Anyway, I think the nokaslr result is interesting, it seems like
On 3/14/25 05:12, Bert Karwatzki wrote:
> Am Donnerstag, dem 13.03.2025 um 22:47 +1100 schrieb Balbir Singh:
>>
>>
>> Anyway, I think the nokaslr result is interesting, it seems like with nokaslr
>> even the older kernels have problems with the game
>>
>> Could you confirm if with nokaslr
>>
> Now
45 matches
Mail list logo