On 10/18/2024 5:09 PM, Felix Kuehling wrote:
On 2024-10-18 17:31, Chen, Xiaogang wrote:
On 10/18/2024 12:57 PM, Felix Kuehling wrote:
On 2024-10-18 10:09, Chen, Xiaogang wrote:
On 10/17/2024 4:04 PM, Felix Kuehling wrote:
On 2024-10-15 17:21, Xiaogang.Chen wrote:
From: Xiaogang Chen
On 10/18/2024 5:07 PM, Felix Kuehling wrote:
On 2024-10-18 17:31, Chen, Xiaogang wrote:
On 10/18/2024 12:57 PM, Felix Kuehling wrote:
On 2024-10-18 10:09, Chen, Xiaogang wrote:
On 10/17/2024 4:04 PM, Felix Kuehling wrote:
On 2024-10-15 17:21, Xiaogang.Chen wrote:
From: Xiaogang Chen
On 2024-10-18 16:21, Kent Russell wrote:
If a 2nd fault comes in before the 1st is handled, the 1st fault will
clear out the FAULT STATUS registers before the 2nd fault is handled.
Thus we get a lot of zeroes. If status=0, just skip the L2 fault status
information, to avoid confusion of why som
On 2024-10-18 17:31, Chen, Xiaogang wrote:
On 10/18/2024 12:57 PM, Felix Kuehling wrote:
On 2024-10-18 10:09, Chen, Xiaogang wrote:
On 10/17/2024 4:04 PM, Felix Kuehling wrote:
On 2024-10-15 17:21, Xiaogang.Chen wrote:
From: Xiaogang Chen
The purpose of this patch is having kfd driver
On 2024-10-18 17:31, Chen, Xiaogang wrote:
On 10/18/2024 12:57 PM, Felix Kuehling wrote:
On 2024-10-18 10:09, Chen, Xiaogang wrote:
On 10/17/2024 4:04 PM, Felix Kuehling wrote:
On 2024-10-15 17:21, Xiaogang.Chen wrote:
From: Xiaogang Chen
The purpose of this patch is having kfd driver
On 10/18/2024 2:14 PM, Felix Kuehling wrote:
On 2024-10-11 10:41, Xiaogang.Chen wrote:
From: Xiaogang Chen
kfd process kref count(process->ref) is initialized to 1 by
kref_init. After
it is created not need to increaes its kref. Instad add kfd process
kref at kfd
process mmu notifier allo
On 2024-10-18 14:28, Felix Kuehling
wrote:
On 2024-10-17 04:34, Victor Zhao wrote:
make sure KFD_FENCE_INIT write to
fence_addr before pm_send_query_status
called, to avoid qcm fence timeout caused by incorrect ord
On 10/18/2024 12:57 PM, Felix Kuehling wrote:
On 2024-10-18 10:09, Chen, Xiaogang wrote:
On 10/17/2024 4:04 PM, Felix Kuehling wrote:
On 2024-10-15 17:21, Xiaogang.Chen wrote:
From: Xiaogang Chen
The purpose of this patch is having kfd driver function as expected
during AMD
gpu device
If a 2nd fault comes in before the 1st is handled, the 1st fault will
clear out the FAULT STATUS registers before the 2nd fault is handled.
Thus we get a lot of zeroes. If status=0, just skip the L2 fault status
information, to avoid confusion of why some VM fault status prints in
dmesg are all zer
On 10/13/24 09:58, Simon Ser wrote:
On Thursday, October 3rd, 2024 at 22:01, Harry Wentland
wrote:
From: Alex Hung
It is to be used to enable HDR by allowing userpace to create and pass
3D LUTs to kernel and hardware.
1. new drm_colorop_type: DRM_COLOROP_3D_LUT.
2. 3D LUT modes define h
It does not support fullscreen 3D.
Fixes: 336568de918e ("drm/amdgpu/swsmu: default to fullscreen 3D profile for
dGPUs")
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/am
On 2024-10-11 10:41, Xiaogang.Chen wrote:
From: Xiaogang Chen
kfd process kref count(process->ref) is initialized to 1 by kref_init. After
it is created not need to increaes its kref. Instad add kfd process kref at kfd
process mmu notifier allocation since we decrease the ref at free_notifier
On Tue, Oct 15, 2024 at 12:37 PM Sharma, Shashank
wrote:
>
>
> On 15/10/2024 16:58, Alex Deucher wrote:
> > On Tue, Oct 15, 2024 at 6:13 AM Sharma, Shashank
> > wrote:
> >> Hello Alex,
> >>
> >> On 14/10/2024 22:29, Deucher, Alexander wrote:
> >>
> >> [AMD Official Use Only - AMD Internal Distrib
On 2024-10-14 05:19, Lijo Lazar wrote:
In certain cases - ex: when a reset is required on initialization - XCP
manager won't have a valid partition mode. In such cases, use SPX as the
default selected mode for which partition configuration details are
populated.
Signed-off-by: Lijo Lazar
Repo
[Public]
> -Original Message-
> From: Kuehling, Felix
> Sent: Friday, October 18, 2024 2:43 PM
> To: Russell, Kent ; amd-gfx@lists.freedesktop.org
> Cc: Cornwall, Jay
> Subject: Re: [PATCH] amdgpu: Don't print L2 status if there's nothing to print
>
>
> On 2024-10-18 11:12, Kent Russell
On 2024-10-18 11:12, Kent Russell wrote:
If a 2nd fault comes in before the 1st is handled, the 1st fault will
clear out the FAULT STATUS registers before the 2nd fault is handled.
Thus we get a lot of zeroes. If status=0, just skip the L2 fault status
information, to avoid confusion of why som
On 2024-10-17 04:34, Victor Zhao wrote:
make sure KFD_FENCE_INIT write to fence_addr before pm_send_query_status
called, to avoid qcm fence timeout caused by incorrect ordering.
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1 +
drivers/gpu/drm/amd/
On 2024-10-18 10:09, Chen, Xiaogang wrote:
On 10/17/2024 4:04 PM, Felix Kuehling wrote:
On 2024-10-15 17:21, Xiaogang.Chen wrote:
From: Xiaogang Chen
The purpose of this patch is having kfd driver function as expected
during AMD
gpu device plug/unplug.
When an AMD gpu device got unplug
On Mon, Sep 9, 2024 at 4:07 PM Shashank Sharma wrote:
>
> The MES FW expects us to allocate at least one page as context
> space to process gang and process related context data. This
> patch creates a joint object for the same, and calculates GPU
> space offsets of these spaces.
>
> V1: Addressed
From: Xiaogang Chen
The purpose of this patch is having kfd driver function as expected during AMD
gpu device plug/unplug.
When an AMD gpu device got unplug kfd driver stops all queues from this device.
If there are user processes still ref the render node this device is marked as
invalid. kfd d
On Wed, Oct 16, 2024 at 1:47 PM Harry Wentland wrote:
>
>
>
> On 2024-09-16 14:23, Thomas Weißschuh wrote:
> > Hi Harry, Leo and other amdgpu maintainers,
> >
> > On 2024-08-24 20:33:53+, Thomas Weißschuh wrote:
> >> The value of "min_input_signal" returned from ATIF on a Framework AMD 13
> >>
It is safe to access dqm->sched status inside dqm_lock, no
race with gpu reset.
Reviewed-by: Philip Yang
On 2024-10-18 11:10, Shaoyun Liu wrote:
From: shaoyunl
Add back kfd queues in start scheduling that originally been
removed on stop scheduling.
Sig
[AMD Official Use Only - AMD Internal Distribution Only]
Reviewed-by: Leo Liu
> -Original Message-
> From: amd-gfx On Behalf Of Lijo
> Lazar
> Sent: October 18, 2024 2:41 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking ; Deucher, Alexander
> ; Bhardwaj, Rajneesh
> ; Errabolu
Ping?
On Tue, Oct 15, 2024 at 2:28 PM Alex Deucher wrote:
>
> Add messages to make it clear when a per ring reset
> happens. This is helpful for debugging and aligns with
> other reset methods.
>
> Signed-off-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 3 +++
> 1 file ch
From: shaoyunl
Add back kfd queues in start scheduling that originally been
removed on stop scheduling.
Signed-off-by: Shaoyun Liu
Reviewed-by: Felix Kuehling
---
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 40 +--
1 file changed, 37 insertions(+), 3 deletions(-)
diff --g
On 2024-10-18 01:31, Zhao, Victor
wrote:
[AMD Official Use Only - AMD Internal Distribution Only]
[AMD Official Use Only - AMD Internal Distribution Only]
Ping. Please help review.
Thanks,
Victor
-Original Message-
From: Victor Zhao
Sent: Thursda
If a 2nd fault comes in before the 1st is handled, the 1st fault will
clear out the FAULT STATUS registers before the 2nd fault is handled.
Thus we get a lot of zeroes. If status=0, just skip the L2 fault status
information, to avoid confusion of why some VM fault status prints in
dmesg are all zer
On Fri, Oct 18, 2024 at 5:46 AM Lang Yu wrote:
>
> Free sg table when dma_map_sgtable() failed to avoid memory leak.
>
> Signed-off-by: Lang Yu
Reviewed-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a
[AMD Official Use Only - AMD Internal Distribution Only]
Good catch . Thanks . I will sent out another review for that .
Regards
Shaoyun.liu
From: Yang, Philip
Sent: Thursday, October 17, 2024 3:47 PM
To: Liu, Shaoyun ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amd/amdkfd: add/re
[Public]
> -Original Message-
> From: SHANMUGAM, SRINIVASAN
> Sent: Thursday, October 17, 2024 9:56 PM
> To: Koenig, Christian ; Deucher, Alexander
>
> Cc: amd-gfx@lists.freedesktop.org; SHANMUGAM, SRINIVASAN
>
> Subject: [PATCH v3] drm/amdgpu/gfx9: Add cleaner shader for GFX9.4.2
>
> T
On 10/17/2024 4:04 PM, Felix Kuehling wrote:
On 2024-10-15 17:21, Xiaogang.Chen wrote:
From: Xiaogang Chen
The purpose of this patch is having kfd driver function as expected
during AMD
gpu device plug/unplug.
When an AMD gpu device got unplug kfd driver stops all queues from
this devic
On 10/18/2024 7:08 PM, Christian König wrote:
Patches #2, #3 and #12 are Acked-by: Christian König
The rest are Reviewed-by: Christian König
Maybe give others till Monday to take a look as well, could be that
Alex, Lijo or somebody else point out that we are ignoring the suspend
return c
Am 18.10.24 um 15:26 schrieb Arunpravin Paneer Selvam:
Add gpu address support to seq64 alloc function.
Looks good to me, but when adding interfaces you should probably have
the user of this in the same patch set.
Regards,
Christian.
Signed-off-by: Arunpravin Paneer Selvam
---
drivers/
Patches #2, #3 and #12 are Acked-by: Christian König
The rest are Reviewed-by: Christian König
Maybe give others till Monday to take a look as well, could be that
Alex, Lijo or somebody else point out that we are ignoring the suspend
return code during XGMI hive reset for a good reason.
I
Before, every time fdinfo is queried we try to lock all the BOs in the
VM and calculate memory usage from scratch. This works okay if the
fdinfo is rarely read and the VMs don't have a ton of BOs. If either of
these conditions is not true, we get a massive performance hit.
In this new revision, we
Since on modern systems all of vram can be made visible anyways, to
simplify the new implementation, drops tracking how much memory is
visible for now. If this is really needed we can add it back on top of
the new implementation, or just report all the BOs as visible.
Signed-off-by: Yunxiang Li
-
The old behavior reports the resident memory usage for this key and the
documentation say so as well. However this was accidentally changed to
include buffers that was evicted.
Fixes: a2529f67e2ed ("drm/amdgpu: Use drm_print_memory_stats helper from
fdinfo")
Signed-off-by: Yunxiang Li
---
drive
amdgpu_vm_bo_invalidate doesn't use the adev parameter and not all
callers have a reference to adev handy, so remove it for cleanliness.
Signed-off-by: Yunxiang Li
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 ++--
drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 2 +-
drivers/gpu/drm/amd/am
Right now every time the fdinfo is read, we go through the vm lists and
lock all the BOs to calcuate the statistics. This causes a lot of lock
contention when the VM is actively used. It gets worse if there is a lot
of shared BOs or if there's a lot of submissions. We have seen
submissions lock-up
Add gpu address support to seq64 alloc function.
Signed-off-by: Arunpravin Paneer Selvam
---
drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c | 10 --
drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h | 3 ++-
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/a
Before making a function call to resume, validate
the function pointer like we do in sw_init.
Use the helper function amdgpu_ip_block_resume where
same checks and calls are repeated.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/aldebaran.c | 13 ++---
drivers/gpu/drm/amd/amdg
We dont need to set the functions to NULL which arent
needed as global structure members are by default
set to zero or NULL for pointers.
Cc: Leo Liu
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c | 4
drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c |
Before making a function call to hw_fini, validate
the function pointer like we do in sw_init.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 38 +-
1 file changed, 22 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu
Before making a function call to wait_for_idle,
validate the function pointer like we do in sw_init.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 ++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
In function amdgpu_device_ip_suspend_phase2 if
suspend call fails for an IP then abort there
and return error to caller.
A failed functionality of IP is critical and
we should not proceed.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
1 file changed, 1 insert
In function amdgpu_reset_xgmi_reset_on_init_suspend
if suspend call fails for an IP then abort there
and return error to caller.
A failed functionality of IP is critical and
we should not proceed.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 1 +
1 file changed, 1
Remove the dummy wait_for_idle functions for all
ip blocks.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c | 6 --
drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c | 6 --
drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 6 --
drivers/gpu/drm/amd/a
Remove the dummy soft_reset functions for all
ip blocks.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c | 6 --
drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c | 6 --
drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 6 --
drivers/gpu/drm/amd/amdg
Some of the functions pointers of amdgpu_ip_funcs
are not used and are left commented out. Hence this
cleans those up which arent used.
Cc: Leo Liu
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 274 --
drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 273
Remove the dummy suspend functions for all
ip blocks.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c | 6 --
drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 3 ++-
drivers/gpu/drm/amd/amdgpu/cik.c | 6 --
drivers/gpu/drm/amd/amdgpu/si.c | 6 --
4
Remove the dummy resume functions for all
ip blocks.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c | 6 --
1 file changed, 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c
index 9b98b40ac4db..1383fd1644d
v6: Fixed the missing return statement on suspend and update the code
with V5 comments.
v5: Fixed review comments. Dropped hw_fini patch and need to look
further why such functions exists. hw_init/hw_fini are mandatory
functions and we should have a valid definition.
v4: hw_init/hw_fi
Before making a function call to suspend, validate
the function pointer like we do in sw_init.
Use the helper function amdgpu_ip_block_suspend where
same checks and calls are repeated.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/aldebaran.c | 11 ++
drivers/gpu/drm/amd/a
On Fri, Oct 18, 2024 at 7:19 AM Christian König
wrote:
>
> Am 18.10.24 um 11:33 schrieb Zhang, Jesse(Jie):
> > [AMD Official Use Only - AMD Internal Distribution Only]
> >
> > Hi Christian,
> >
> > -Original Message-
> > From: Koenig, Christian
> > Sent: Friday, October 18, 2024 4:47 PM
>
Am 18.10.24 um 14:46 schrieb Raag Jadav:
As far as I can see this makes the enum how to recover the device
superfluous because you will most likely always need a bus reset to get out
of this again.
That depends on the kind of fault the device has encountered and the bus it is
sitting on. There c
On 10/18/2024 4:40 PM, Christian König wrote:
Am 17.10.24 um 18:25 schrieb Sunil Khatri:
Use the helper function amdgpu_ip_block_suspend where
same checks and calls are repeated.
I strongly suggest to squash this patch and the next one together.
Sure. Noted
Signed-off-by: Sunil Khatri
---
Am 18.10.24 um 11:33 schrieb Zhang, Jesse(Jie):
[AMD Official Use Only - AMD Internal Distribution Only]
Hi Christian,
-Original Message-
From: Koenig, Christian
Sent: Friday, October 18, 2024 4:47 PM
To: Zhang, Jesse(Jie) ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander
Subject
Am 17.10.24 um 18:25 schrieb Sunil Khatri:
v5: Fixed review comments. Dropped hw_fini patch and need to look
further why such functions exists. hw_init/hw_fini are mandatory
functions and we should have a valid definition.
v4: hw_init/hw_fini functions are mandatory and raise error mes
Am 17.10.24 um 18:25 schrieb Sunil Khatri:
Use the helper function amdgpu_ip_block_resume where
same checks and calls are repeated.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 +
2 files
Am 17.10.24 um 18:25 schrieb Sunil Khatri:
Use the helper function amdgpu_ip_block_suspend where
same checks and calls are repeated.
I strongly suggest to squash this patch and the next one together.
Signed-off-by: Sunil Khatri
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 +
driver
Am 17.10.24 um 18:43 schrieb Rodrigo Vivi:
On Thu, Oct 17, 2024 at 09:59:10AM +0200, Christian König wrote:
Purpose of this implementation is to provide drivers a generic way to
recover with the help of userspace intervention. Different drivers may
have different ideas of a "wedged device" depen
Free sg table when dma_map_sgtable() failed to avoid memory leak.
Signed-off-by: Lang Yu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 74a
[AMD Official Use Only - AMD Internal Distribution Only]
Hi Christian,
-Original Message-
From: Koenig, Christian
Sent: Friday, October 18, 2024 4:47 PM
To: Zhang, Jesse(Jie) ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander
Subject: Re: [PATCH] drm/amdgpu: add the command AMDGPU_I
Add nps_mode in RAS init_flag.
Signed-off-by: Candice Li
Reviewed-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 3 +++
drivers/gpu/drm/amd/amdgpu/ta_ras_if.h | 9 +
2 files changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
b/drivers/gpu/drm/a
Am 18.10.24 um 10:19 schrieb jesse.zh...@amd.com:
Not all ASICs support the queue reset feature.
Therefore, userspace can query this feature
via AMDGPU_INFO_QUEUE_RESET before validating a queue reset.
Why would UMDs need that information?
Signed-off-by: Jesse Zhang
---
drivers/gpu/drm/am
Not all ASICs support the queue reset feature.
Therefore, userspace can query this feature
via AMDGPU_INFO_QUEUE_RESET before validating a queue reset.
Signed-off-by: Jesse Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 27 +
include/uapi/drm/amdgpu_drm.h |
In MI300 series, doorbell will get corrupted in mutil-VF scenario. This
is a HW bug, see DEGGIGX90-5071 and SWDEV-480706 for details.
The fix is set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 in multi-VF
mode.
Signed-off-by: Samuel Zhang
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c |
This was supposed to be an unlock instead of a lock. The original
code will lead to a deadlock.
Fixes: ee52489d1210 ("drm/amdgpu: Place NPS mode request on unload")
Signed-off-by: Dan Carpenter
---
>From static analysis, not testing.
---
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 2 +-
1 file c
[ 22.120385] [ cut here ]
[ 22.120389] WARNING: CPU: 13 PID: 11 at
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn30/dcn30_dpp.c:501
dpp3_deferred_update+0x106/0x330 [amdgpu]
[ 22.120484] Modules linked in: fuse michael_mic hid_jabra
ip6table_filter ip6_tables xt_LOG nf_
Hello,
Our static analysis tool has identified a potential null-pointer dereference or
redundant null check related to the wait-completion synchronization mechanism in
amdgpu_dm.c in Linux 6.11.
Consider the following execution scenario:
dmub_aux_setconfig_callback() //731
if (adev->d
70 matches
Mail list logo