devcoredump is used to debug gpu hangs/resets. So in normal process when
there is a hang due to ring timeout or page fault we are doing a hard
reset as soft reset fail in those cases. How are we making sure that the
devcoredump is triggered in those cases and captured?
Regards
Sunil Khatri
On
On 4/17/2024 1:06 PM, Khatri, Sunil wrote:
devcoredump is used to debug gpu hangs/resets. So in normal process
when there is a hang due to ring timeout or page fault we are doing a
hard reset as soft reset fail in those cases. How are we making sure
that the devcoredump is triggered in those
On 4/17/2024 1:14 PM, Khatri, Sunil wrote:
>
> On 4/17/2024 1:06 PM, Khatri, Sunil wrote:
>> devcoredump is used to debug gpu hangs/resets. So in normal process
>> when there is a hang due to ring timeout or page fault we are doing a
>> hard reset as soft reset fail in those cases. How are we m
Add the prototype to dump ip registers
for all ips of different asics and set
them to NULL for now. Based on the
requirement add a function pointer for
each of them.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_m
Adding gfx10 gc registers to be used for register
dump via devcoredump during a gpu reset.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 8 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 4 +
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 130 ++
Add the protoype for print ip state to be used
to print the registers in devcoredump during
a gpu reset.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
Add ip dump for each ip of the asic in the
devcoredump for all the ips where a callback
is registered for register dump.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c | 14 ++
1 file changed, 14 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu
Add support to print ip information to be
used to print registers in devcoredump
buffer.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 17 -
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
b/drivers/
Invoke the dump_ip_state function for each ip before
the asic resets and save the register values for
debugging via devcoredump.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++
1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu
On 4/17/2024 1:19 PM, Lazar, Lijo wrote:
On 4/17/2024 1:14 PM, Khatri, Sunil wrote:
On 4/17/2024 1:06 PM, Khatri, Sunil wrote:
devcoredump is used to debug gpu hangs/resets. So in normal process
when there is a hang due to ring timeout or page fault we are doing a
hard reset as soft reset fa
Am 17.04.24 um 10:18 schrieb Sunil Khatri:
Adding gfx10 gc registers to be used for register
dump via devcoredump during a gpu reset.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 8 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 4 +
drivers/gpu
On 4/17/2024 2:15 PM, Christian König wrote:
Am 17.04.24 um 10:18 schrieb Sunil Khatri:
Adding gfx10 gc registers to be used for register
dump via devcoredump during a gpu reset.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 8 ++
drivers/gpu/drm/amd/
[AMD Official Use Only - General]
Ping...
Regards,
Stanley
> -Original Message-
> From: Yang, Stanley
> Sent: Friday, April 12, 2024 2:21 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Yang, Stanley
> Subject: [PATCH Review 1/1] drm/amdgpu: Support setting reset_method at
> runtime
>
> Si
Add the prototype to dump ip registers
for all ips of different asics and set
them to NULL for now. Based on the
requirement add a function pointer for
each of them.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_m
Adding gfx10 gc registers to be used for register
dump via devcoredump during a gpu reset.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 8 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 4 +
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 130 ++
Add the protoype for print ip state to be used
to print the registers in devcoredump during
a gpu reset.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
Invoke the dump_ip_state function for each ip before
the asic resets and save the register values for
debugging via devcoredump.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++
1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu
Add support to print ip information to be
used to print registers in devcoredump
buffer.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 17 -
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
b/drivers/
Add ip dump for each ip of the asic in the
devcoredump for all the ips where a callback
is registered for register dump.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c | 14 ++
1 file changed, 14 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu
Print the od status info if it's not supported.
Signed-off-by: Ma Jun
---
drivers/gpu/drm/amd/pm/amdgpu_pm.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index 5bc1cd4993e8..a20e03e69d38 100644
--- a/drivers/gpu/dr
On 4/17/2024 3:10 PM, Ma Jun wrote:
> Print the od status info if it's not supported.
>
> Signed-off-by: Ma Jun
> ---
> drivers/gpu/drm/amd/pm/amdgpu_pm.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> i
From: Alex Deucher
Makes it easier to review the logs when there are MES
errors.
v2: use dbg for emitted, add helpers for fetching strings
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 78 --
1 file changed, 74 insertions(+), 4 deletions(-)
From: Alex Deucher
This reverts commit a518c746510e03d8a78db432a659770182546b9e.
Reduce the time we wait since we are waiting for the
fence with the spinlock held.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
dif
The MES firmware expects synchronous operation with the
driver. For this to work asynchronously, each caller
would need to provide its own fence location and sequence
number.
For now, add a mutex lock to serialize the MES submission.
For SR-IOV long-wait case, break the long-wait to separated
par
On 4/17/2024 11:23 AM, Ma Jun wrote:
> gpu_od should be removed if it's an empty directory
>
> Signed-off-by: Ma Jun
> Reported-by: Yang Wang
> ---
> drivers/gpu/drm/amd/pm/amdgpu_pm.c | 7 +++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> b/dr
HDP Flush request bit can be kept unique per AID, and doesn't need to be
unique SOC-wide. Assign only bits 10-13 for SDMA v4.4.2.
Signed-off-by: Lijo Lazar
---
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgp
During mode-2 reset, pci config space registers are affected at device
side. However, certain platforms have switches which assign virtual BAR
addresses and returns the same even after device is reset. This
affects pci_restore_state() as it doesn't issue another config write, if
the value read is s
Am 12.04.24 um 08:21 schrieb Stanley.Yang:
Signed-off-by: Stanley.Yang
You are missing a commit message, without it the patch will
automatically be rejected when you try to push it.
With that added Reviewed-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +-
1 file
Am 17.04.24 um 13:30 schrieb Horace Chen:
The MES firmware expects synchronous operation with the
driver. For this to work asynchronously, each caller
would need to provide its own fence location and sequence
number.
Well that's certainly not correct. The seqno takes care that we can wait
asy
smu v14.0.1 re-used smu v14.0.0
Signed-off-by: Li Ma
---
drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0.c | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0.c
b/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0.c
index 3bc9662fbd28.
[AMD Official Use Only - General]
> -Original Message-
> From: amd-gfx On Behalf Of
> Sonny Jiang
> Sent: Monday, April 15, 2024 5:25 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Jiang, Sonny ; Jiang, Sonny
>
> Subject: [PATCH 1/2] drm/amdgpu: IB size alignment on VCN5
>
> From: Sonny Ji
On Wed, Apr 17, 2024 at 8:07 AM Lijo Lazar wrote:
>
> HDP Flush request bit can be kept unique per AID, and doesn't need to be
> unique SOC-wide. Assign only bits 10-13 for SDMA v4.4.2.
>
> Signed-off-by: Lijo Lazar
Acked-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 3 +
Hi Christian,
On 4/17/2024 12:19 PM, Christian König wrote:
Am 17.04.24 um 08:21 schrieb Arunpravin Paneer Selvam:
Now we have two flags for contiguous VRAM buffer allocation.
If the application request for AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
it would set the ttm place TTM_PL_FLAG_CONTIGUOUS fla
On Wed, Apr 17, 2024 at 9:02 AM Li Ma wrote:
>
> smu v14.0.1 re-used smu v14.0.0
>
> Signed-off-by: Li Ma
Reviewed-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0.c | 8
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/sw
[AMD Official Use Only - General]
-Original Message-
From: amd-gfx On Behalf Of Horace Chen
Sent: Wednesday, April 17, 2024 7:30 AM
To: amd-gfx@lists.freedesktop.org
Cc: Andrey Grodzovsky ; Kuehling, Felix
; Chen, Horace ; Koenig, Christian
; Deucher, Alexander ;
Xiao, Jack ; Zhang, Ha
[AMD Official Use Only - General]
> -Original Message-
> From: Christian König
> Sent: Wednesday, April 17, 2024 8:46 PM
> To: Yang, Stanley ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH Review 1/1] drm/amdgpu: Support setting reset_method
> at runtime
>
> Am 12.04.24 um 08:21 sch
Hi Christian,
On 4/17/2024 6:57 PM, Paneer Selvam, Arunpravin wrote:
Hi Christian,
On 4/17/2024 12:19 PM, Christian König wrote:
Am 17.04.24 um 08:21 schrieb Arunpravin Paneer Selvam:
Now we have two flags for contiguous VRAM buffer allocation.
If the application request for AMDGPU_GEM_CREATE
On 2023-11-02 00:21, Joshua Ashton wrote:
> Otherwise we can end up with a frame on unsuspend where color management
> is not applied when userspace has not committed themselves.
>
> Fixes re-applying color management on Steam Deck/Gamescope on S3 resume.
>
> Signed-off-by: Joshua Ashton
Going
On 2024-04-17 10:32, Paneer Selvam,
Arunpravin wrote:
Hi
Christian,
On 4/17/2024 6:57 PM, Paneer Selvam, Arunpravin wrote:
Hi Christian,
On 4/17/2024 12:19 PM, Christian König wrote:
A
Hi Philip,
On 4/17/2024 8:58 PM, Philip Yang wrote:
On 2024-04-17 10:32, Paneer Selvam, Arunpravin wrote:
Hi Christian,
On 4/17/2024 6:57 PM, Paneer Selvam, Arunpravin wrote:
Hi Christian,
On 4/17/2024 12:19 PM, Christian König wrote:
Am 17.04.24 um 08:21 schrieb Arunpravin Paneer Selvam:
On Wed, Apr 17, 2024 at 5:38 AM Sunil Khatri wrote:
>
> Add the prototype to dump ip registers
> for all ips of different asics and set
> them to NULL for now. Based on the
> requirement add a function pointer for
> each of them.
>
> Signed-off-by: Sunil Khatri
> ---
> drivers/gpu/drm/amd/amdgpu
On Wed, Apr 17, 2024 at 5:38 AM Sunil Khatri wrote:
>
> Add the protoype for print ip state to be used
> to print the registers in devcoredump during
> a gpu reset.
>
> Signed-off-by: Sunil Khatri
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c | 1 +
> drivers/gpu/drm/amd/amdgpu/amdgp
On Wed, Apr 17, 2024 at 5:38 AM Sunil Khatri wrote:
>
> Adding gfx10 gc registers to be used for register
> dump via devcoredump during a gpu reset.
>
> Signed-off-by: Sunil Khatri
Reviewed-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 8 ++
> drivers/gpu/drm/amd/
On Wed, Apr 17, 2024 at 5:38 AM Sunil Khatri wrote:
>
> Add support to print ip information to be
> used to print registers in devcoredump
> buffer.
>
> Signed-off-by: Sunil Khatri
Reviewed-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 17 -
> 1 file change
On Wed, Apr 17, 2024 at 5:38 AM Sunil Khatri wrote:
>
> Invoke the dump_ip_state function for each ip before
> the asic resets and save the register values for
> debugging via devcoredump.
>
> Signed-off-by: Sunil Khatri
Reviewed-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_devi
On Wed, Apr 17, 2024 at 5:38 AM Sunil Khatri wrote:
>
> Add ip dump for each ip of the asic in the
> devcoredump for all the ips where a callback
> is registered for register dump.
>
> Signed-off-by: Sunil Khatri
Reviewed-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump
On 4/17/2024 9:21 PM, Alex Deucher wrote:
> On Wed, Apr 17, 2024 at 5:38 AM Sunil Khatri wrote:
>>
>> Adding gfx10 gc registers to be used for register
>> dump via devcoredump during a gpu reset.
>>
>> Signed-off-by: Sunil Khatri
>
> Reviewed-by: Alex Deucher
>
>> ---
>> drivers/gpu/drm/am
On 4/17/2024 9:31 PM, Lazar, Lijo wrote:
On 4/17/2024 9:21 PM, Alex Deucher wrote:
On Wed, Apr 17, 2024 at 5:38 AM Sunil Khatri wrote:
Adding gfx10 gc registers to be used for register
dump via devcoredump during a gpu reset.
Signed-off-by: Sunil Khatri
Reviewed-by: Alex Deucher
---
[AMD Official Use Only - General]
Yes, right now that API doesn't return anything. What I meant is to add that
check as well as coredump API is essentially used in hang situations.
Old times, access to registers while in GFXOFF resulted in system hang
(basically it won't go beyond this point).
On Wed, Apr 17, 2024 at 12:24 PM Lazar, Lijo wrote:
>
> [AMD Official Use Only - General]
>
> Yes, right now that API doesn't return anything. What I meant is to add that
> check as well as coredump API is essentially used in hang situations.
>
> Old times, access to registers while in GFXOFF res
On 4/17/2024 10:21 PM, Alex Deucher wrote:
On Wed, Apr 17, 2024 at 12:24 PM Lazar, Lijo wrote:
[AMD Official Use Only - General]
Yes, right now that API doesn't return anything. What I meant is to add that
check as well as coredump API is essentially used in hang situations.
Old times, acc
On Wed, Apr 17, 2024 at 1:01 PM Khatri, Sunil wrote:
>
>
> On 4/17/2024 10:21 PM, Alex Deucher wrote:
> > On Wed, Apr 17, 2024 at 12:24 PM Lazar, Lijo wrote:
> >> [AMD Official Use Only - General]
> >>
> >> Yes, right now that API doesn't return anything. What I meant is to add
> >> that check a
On 2024-04-16 10:10, Harry Wentland wrote:
On 2024-04-16 04:01, Pekka Paalanen wrote:
On Mon, 15 Apr 2024 18:33:39 -0400
Leo Li wrote:
On 2024-04-15 04:19, Pekka Paalanen wrote:
On Fri, 12 Apr 2024 16:14:28 -0400
Leo Li wrote:
On 2024-04-12 11:31, Alex Deucher wrote:
On Fri, Apr
[AMD Official Use Only - General]
I have a discussion with Christian about this before . The conclusion is that
driver should prevent multiple process from using the MES ring at the same
time . Also for current MES ring usage ,driver doesn't have the logic to
prevent the ring been overf
[AMD Official Use Only - General]
Looks good to me .
Reviewed by Shaoyun.liu < shaoyun@amd.com>
-Original Message-
From: amd-gfx On Behalf Of Horace Chen
Sent: Wednesday, April 17, 2024 7:30 AM
To: amd-gfx@lists.freedesktop.org
Cc: Andrey Grodzovsky ; Kuehling, Felix
; Chen, Horace
[Public]
Acked-by: Alex Deucher
From: amd-gfx on behalf of Hawking
Zhang
Sent: Tuesday, April 16, 2024 1:56 AM
To: amd-gfx@lists.freedesktop.org ; Zhou1, Tao
Cc: Zhang, Hawking
Subject: [PATCH] drm/amdgpu: Use driver mode reset for data poison handling
mode
tree/branch:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: 4eab358930711bbeb85bf5ee267d0d42d3394c2c Add linux-next specific
files for 20240417
Error/Warning reports:
https://lore.kernel.org/oe-kbuild-all/202404171743.hfpscodv-...@intel.com
https
On Wed, Apr 17, 2024 at 3:17 PM Liu, Shaoyun wrote:
>
> [AMD Official Use Only - General]
>
> I have a discussion with Christian about this before . The conclusion is
> that driver should prevent multiple process from using the MES ring at the
> same time . Also for current MES ring usage ,
Hi Dave, Sima,
Fixes for 6.9.
The following changes since commit 0bbac3facb5d6cc0171c45c9873a2dc96bea9680:
Linux 6.9-rc4 (2024-04-14 13:38:39 -0700)
are available in the Git repository at:
https://gitlab.freedesktop.org/agd5f/linux.git
tags/amd-drm-fixes-6.9-2024-04-17
for you to fetch c
Makes it easier to review the logs when there are MES
errors.
v2: use dbg for emitted, add helpers for fetching strings
v3: fix missing commas (Harish)
Reviewed by Shaoyun.liu (v2)
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 78 --
1 file ch
Host will initiate an FLR for all poison consumption.
Guest should wait for FLR message to re-init data exchange.
Signed-off-by: Zhigang Luo
---
drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
b/drivers/gpu/drm/amd
I'm a bit late to the game but I don't think this is merged
yet.
On 2024-01-15 11:05, Andri Yngvason wrote:
> From: Werner Sembach
>
> Add a new general drm property "force color format" which can be used
> by userspace to tell the graphics driver which color format to use.
>
> Possible options
On 2024-01-15 11:05, Andri Yngvason wrote:
> From: Werner Sembach
>
> Remove unnecessary SIGNAL_TYPE_HDMI_TYPE_A check that was performed in the
> drm_mode_is_420_only() case, but not in the drm_mode_is_420_also() &&
> force_yuv420_output case.
>
> Without further knowledge if YCbCr 4:2:0 is
[AMD Official Use Only - General]
Reviewed-by: Hawking Zhang
Regards,
Hawking
-Original Message-
From: Luo, Zhigang
Sent: Wednesday, April 17, 2024 15:54
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Saye, Sashank
; Chan, Hing Pong ; Luo, Zhigang
Subject: [PATCH] drm/amdgpu:
On 4/17/2024 7:52 PM, Lazar, Lijo wrote:
>
>
> On 4/17/2024 11:23 AM, Ma Jun wrote:
>> gpu_od should be removed if it's an empty directory
>>
>> Signed-off-by: Ma Jun
>> Reported-by: Yang Wang
>> ---
>> drivers/gpu/drm/amd/pm/amdgpu_pm.c | 7 +++
>> 1 file changed, 7 insertions(+)
>>
>>
On Wed, Apr 17, 2024 at 9:51 PM wangzhu wrote:
>
> Hi Greg, thanks for your reply. Since there is no patch to fix CVE-2023-52624
> in linux-5.10, there is a patch in the linux-6.7 branch, its commit is
> 2ef98c6d753a744e333b7e34b9cf687040fba57d ("drm/amd/display: Wake DMCUB before
> executing G
Add message fifo to handle RAS poison events.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 32 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 18 ++
2 files changed, 50 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.
Prepare for logging ecc errors.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 33 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 23 +
2 files changed, 56 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/driver
Add interface to reserve bad page.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 19 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 4
2 files changed, 23 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/am
Add poison creation handler.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 74 +++--
1 file changed, 69 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 64e6e20c6de7
Add interface to update umc v12_0 ecc status.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 9 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 6 +
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
Umc v12_0 converts error address.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 94 +-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.h | 12
2 files changed, 105 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
b/dri
Add delay work to retire bad pages.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 36 -
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 3 +++
4 files ch
1. umc v12_0 logs ecc errors.
2. Reserve newly detected ecc error pages.
3. Add tag for bad pages, so that they can
be retired later.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 67 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 7 ++-
driver
Retire bad pages for umc v12_0.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 57 +-
1 file changed, 55 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
index 6c2b61ef5b5
Add condition check for amdgpu_umc_fill_error_record.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 20 +---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 2 +-
3 files changed, 19 insertions(+), 4 deletions(
Prepare to handle pasid poison consumption.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 9 -
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 5 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 20 ---
drivers/gpu/drm/amd/amdgpu/am
Add poison consumption handler.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 43 ++---
1 file changed, 39 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index c1f146d3e
support ACA logging ecc errors.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 5 +
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
index bd917eb6ea24..8df84feaf046 100644
--- a/drivers/gp
retired_page is page frame and should be expanded
to the full address when querying status.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/am
Use new interface to reserve bad page.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index d1a2ab944b7d..dee66db10fa2
Handle case that dma_fence_get_rcu_safe returns NULL.
If restore work is already scheduled, only update its timer. The same
work item cannot be queued twice, so undo the extra queue eviction.
Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs")
Signed-off-by: Felix Kuehling
-
The vpe dpm settings should be done before firmware is loaded.
Otherwise, the frequency cannot be successfully raised.
Signed-off-by: Peyton Lee
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c | 2 +-
drivers/gpu/drm/amd/amdgpu/vpe_v6_1.c | 14 +++---
2 files changed, 8 insertions(+), 8 d
[AMD Official Use Only - General]
Reviewed-by: Yifan Zhang
Best Regards,
Yifan
-Original Message-
From: Ma, Li
Sent: Wednesday, April 17, 2024 8:52 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Zhang, Yifan
; Feng, Kenneth ; Gao, Likun
; Ma, Li
Subject: [PATCH] drm/a
[AMD Official Use Only - General]
Reviewed-by: Asad Kamal
Thanks & Regards
Asad
-Original Message-
From: Lazar, Lijo
Sent: Wednesday, April 17, 2024 5:32 PM
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Deucher, Alexander
; Kamal, Asad
Subject: [PATCH] drm/amd/pm: Restore co
Am 17.04.24 um 21:21 schrieb Alex Deucher:
On Wed, Apr 17, 2024 at 3:17 PM Liu, Shaoyun wrote:
[AMD Official Use Only - General]
I have a discussion with Christian about this before . The conclusion is that
driver should prevent multiple process from using the MES ring at the same
time .
Am 17.04.24 um 19:30 schrieb Alex Deucher:
On Wed, Apr 17, 2024 at 1:01 PM Khatri, Sunil wrote:
On 4/17/2024 10:21 PM, Alex Deucher wrote:
On Wed, Apr 17, 2024 at 12:24 PM Lazar, Lijo wrote:
[AMD Official Use Only - General]
Yes, right now that API doesn't return anything. What I meant is
87 matches
Mail list logo