To fix potential overflowed constant warning reported by Coverity,
modify the variables to uint32_t.
Signed-off-by: Bob Zhou
---
drivers/gpu/drm/amd/amdgpu/imu_v12_0.c | 7 ---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/imu_v12_0.c
b/drivers/gpu
On 6/3/2024 11:42 PM, Eric Huang wrote:
> reset cause is requested by customer as additional
> info for gpu reset smi event.
>
> v2: integerate reset sources suggested by Lijo Lazar
>
> Signed-off-by: Eric Huang
This series is
Reviewed-by: Lijo Lazar
I think SMI needs to get all re
Hi Shaoyun,
see inline.
Am 03.06.24 um 20:28 schrieb Liu, Shaoyun:
[AMD Official Use Only - AMD Internal Distribution Only]
Thanks Christian for the detail explanation.
I checked your patch , you try to use query_scheduler_status package to check
the command completion . It may not work as
Am 04.06.24 um 09:08 schrieb Bob Zhou:
To fix potential overflowed constant warning reported by Coverity,
modify the variables to uint32_t.
Signed-off-by: Bob Zhou
Acked-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/imu_v12_0.c | 7 ---
1 file changed, 4 insertions(+), 3 deleti
Using drm_gem_prime_handle_to_fd() to set dmabuf up and insert it into
descriptor table, only to have it looked up by file descriptor and
remove it from descriptor table is not just too convoluted - it's
racy; another thread might have modified the descriptor table while
we'd been going through tha
Am 23.05.24 um 19:30 schrieb Armin Wolf:
This reverts commit 56b522f4668167096a50c39446d6263c96219f5f.
A user reported that this commit breaks the integrated gpu of his
notebook, causing a black screen. He was able to bisect the problematic
commit and verified that by reverting it the notebook
On Mon, Jun 3, 2024, at 22:59, George Zhang wrote:
> This reverts commit 416b5c5eec9e708b31c68f00cb79130f2cfaf7ed.
>
> This patch caused a regression on DCN 3.2 on the IGT test
> assr-links-suspend, with
> the dmesg warning:
>
> BUG: sleeping function called from invalid context at
> include/linu
Use of close_fd() in cleanup on failure exits is
wrong; descriptor table is a shared data structure, and
as soon as you've inserted a file reference there it is
entirely possible for another thread to have move it
around, replace it or remove it.
Fortunately, not many places are us
On Wed, 22 May 2024, Li Ma wrote:
> From: Basavaraj Natikar
>
> During the initialization sensors may take some time to respond. Hence,
> increase the sensor command timeouts in order to obtain status responses
> within a maximum timeout.
> (Li: backport for s0ix issue, these patches have landed
The if conditions !A || A && B can be simplified to !A || B.
Fixes the following Coccinelle/coccicheck warnings reported by
excluded_middle.cocci:
WARNING !A || A && B is equivalent to !A || B
WARNING !A || A && B is equivalent to !A || B
WARNING !A || A && B is equivalent
Instead of trying to use close_fd() on failure exits, just have
criu_get_prime_handle() store the file reference without inserting
it into descriptor table.
Then, once the callers are past the last failure exit, they can go
and either insert all those file references into the corresponding
slots o
[AMD Official Use Only - AMD Internal Distribution Only]
The trouble with taking the read side lock in the MES helper functions is that
we use a lot of them during reset under the write lock. So either we need to
duplicate the helper functions or we will get inconsistencies where a random
subse
On Tue, Jun 4, 2024 at 8:57 AM Jiri Kosina wrote:
>
> On Wed, 22 May 2024, Li Ma wrote:
>
> > From: Basavaraj Natikar
> >
> > During the initialization sensors may take some time to respond. Hence,
> > increase the sensor command timeouts in order to obtain status responses
> > within a maximum t
[AMD Official Use Only - AMD Internal Distribution Only]
Acked-by: Leo Liu
> -Original Message-
> From: Wu, David
> Sent: Thursday, May 30, 2024 10:59 AM
> To: amd-gfx@lists.freedesktop.org; Koenig, Christian
>
> Cc: Deucher, Alexander ; Liu, Leo
> ; Jiang, Sonny ; Dong, Ruijing
>
> S
On Mon, Jun 3, 2024 at 5:07 PM George Zhang wrote:
>
> This reverts commit 416b5c5eec9e708b31c68f00cb79130f2cfaf7ed.
>
> This patch caused a regression on DCN 3.2 on the IGT test assr-links-suspend,
> with
> the dmesg warning:
>
> BUG: sleeping function called from invalid context at
> include/l
This can be called in atomic context. Should fix:
BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:306
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 449, name: kworker/u64:8
preempt_count: 2, expected: 0
RCU nest depth: 0, expected: 0
Preemption disabled at
Thanks for your review Lijo, I will send a patch with reset source in
another places.
Regards,
Eric
On 2024-06-04 03:26, Lazar, Lijo wrote:
On 6/3/2024 11:42 PM, Eric Huang wrote:
reset cause is requested by customer as additional
info for gpu reset smi event.
v2: integerate reset sources s
Am 04.06.24 um 15:50 schrieb Alex Deucher:
This can be called in atomic context. Should fix:
BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:306
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 449, name: kworker/u64:8
preempt_count: 2, expected: 0
RCU nest
On Tue, Jun 4, 2024 at 10:32 AM Christian König
wrote:
>
> Am 04.06.24 um 15:50 schrieb Alex Deucher:
> > This can be called in atomic context. Should fix:
> >
> > BUG: sleeping function called from invalid context at
> > include/linux/sched/mm.h:306
> > in_atomic(): 1, irqs_disabled(): 0, non_b
This mirrors what the driver does for older DCN generations.
Should fix:
BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:306
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 449, name: kworker/u64:8
preempt_count: 2, expected: 0
RCU nest depth: 0, expected: 0
To fullfill the reset event description.
Suggested-by: Lijo Lazar
Signed-off-by: Eric Huang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 +
3 files changed, 3 insertions(+)
diff --git a/drive
Hi guys,
as already discussed on the mailing list Tvrtko and Friedrich stumbled
over a bunch of problems with the memory management. Especially that
move rate limit didn't seemed to work for VRAM|GTT BOs and causing bunch
of additional and unecessary overhead during CS.
This (not well tested) pat
The approach of having a separate WB slot for each submission doesn't
really work well and for example breaks GPU reset.
Use a status query packet for the fence update instead since those
should always succeed we can use the fence of the original packet to
signal the state of the operation.
Only
That is just a waste of time on APUs.
Signed-off-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 8d8c39be6129..f
That should probably come last.
Signed-off-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 16
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index f7b534
From: Tvrtko Ursulin
Currently the driver appears to be thinking that it will be attempting to
re-validate the evicted buffers on the next submission if they are not in
their preferred placement.
That however appears not to be true for the very common case of buffers
with allowed placements of V
This adds support to enable a placement only when a certain treshold of
moved bytes is reached. It's a context flag which will be handled
together with TTM_PL_FLAG_DESIRED and TTM_PL_FLAG_FALLBACK.
Signed-off-by: Christian König
---
drivers/gpu/drm/ttm/ttm_bo.c | 5 ++---
drivers/gpu/drm/
This should prevent buffer moves when the threshold is reached during
CS.
Signed-off-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 36 --
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 22 +
2 files changed, 29 insertions(+), 29 deletions(-)
tree/branch:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: d97496ca23a2d4ee80b7302849404859d9058bcd Add linux-next specific
files for 20240604
Error/Warning reports:
https://lore.kernel.org/oe-kbuild-all/202406041641.we3cct4c-...@intel.com
Error
[AMD Official Use Only - AMD Internal Distribution Only]
Tested-by: George Zhang
Thanks,
George
-Original Message-
From: Deucher, Alexander
Sent: Tuesday, June 4, 2024 11:50 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Mahfooz, Hamza
; Zhang, George ; Arnd Bergmann
;
This reverts commit 44069f0f9b1fe577c5d4f05fa9eb02db8c618adc since
the code path is called from FPU context, and triggers error like:
[ 26.924055] BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:306
[ 26.924060] in_atomic(): 1, irqs_disabled(): 0, non_block: 0,
On 6/4/24 13:45, Aurabindo Pillai wrote:
This reverts commit 44069f0f9b1fe577c5d4f05fa9eb02db8c618adc since
the code path is called from FPU context, and triggers error like:
[ 26.924055] BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:306
[ 26.924060] in_atom
This mirrors what the driver does for older DCN generations.
Should fix:
[ 26.924055] BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:306
[ 26.924060] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1022,
name: modprobe
[ 26.924063] preempt_count: 2, e
This mirrors what the driver does for older DCN generations.
Should fix:
BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:306
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 449, name: kworker/u64:8
preempt_count: 2, expected: 0
RCU nest depth: 0, expected: 0
On 2024-06-03 22:13, Al Viro wrote:
Using drm_gem_prime_handle_to_fd() to set dmabuf up and insert it into
descriptor table, only to have it looked up by file descriptor and
remove it from descriptor table is not just too convoluted - it's
racy; another thread might have modified the descriptor
On 2024-06-03 22:14, Al Viro wrote:
Instead of trying to use close_fd() on failure exits, just have
criu_get_prime_handle() store the file reference without inserting
it into descriptor table.
Then, once the callers are past the last failure exit, they can go
and either insert all those file r
On 2024-06-03 18:19, Armin Wolf wrote:
Am 23.05.24 um 19:30 schrieb Armin Wolf:
This reverts commit 56b522f4668167096a50c39446d6263c96219f5f.
A user reported that this commit breaks the integrated gpu of his
notebook, causing a black screen. He was able to bisect the problematic
commit and v
[AMD Official Use Only - AMD Internal Distribution Only]
> -Original Message-
> From: Kuehling, Felix
> Sent: Tuesday, June 4, 2024 2:25 PM
> To: Armin Wolf ; Deucher, Alexander
> ; Koenig, Christian
> ; Pan, Xinhui ;
> gre...@linuxfoundation.org; sas...@kernel.org
> Cc: sta...@vger.kerne
On 2024-06-03 04:49, Jesse Zhang wrote:
idr_for_each_entry can ensure that mem is not empty during the loop.
So don't need check mem again.
Signed-off-by: Jesse Zhang
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 -
1 file changed, 5 deletions(-)
d
On 2024-06-03 04:49, Jesse Zhang wrote:
The queue type can only be KFD_QUEUE_TYPE_DIQ or KFD_QUEUE_TYPE_HIQ,
and the default cannot be reached.
I wonder, if you remove the default case, I guess you are relying on the
compiler or a static checker to ensure that we can only pass valid enum
va
[Public]
> -Original Message-
> From: Arnd Bergmann
> Sent: Tuesday, June 4, 2024 3:43 PM
> To: Deucher, Alexander ; amd-
> g...@lists.freedesktop.org
> Cc: Zhang, George ; Mahfooz, Hamza
> ; Wentland, Harry
> ; Li, Sun peng (Leo) ;
> Siqueira, Rodrigo ; Aberback, Joshua
>
> Subject: Re:
This mirrors what the driver does for older DCN generations.
Should fix:
BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:306
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 449, name: kworker/u64:8
preempt_count: 2, expected: 0
RCU nest depth: 0, expected: 0
This mirrors what the driver does for older DCN generations.
Should fix:
[ 26.924055] BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:306
[ 26.924060] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1022,
name: modprobe
[ 26.924063] preempt_count: 2, e
On Tue, Jun 4, 2024 at 8:41 AM Greg Kroah-Hartman
wrote:
>
> On Thu, May 30, 2024 at 10:36:57PM -0700, Chia-I Wu wrote:
> > We can skip children resources when the parent resource does not cover
> > the range.
> >
> > This should help vmf_insert_* users on x86, such as several DRM drivers.
> > On
On 2024-05-22 18:05, Mario Limonciello wrote:
When the `power_saving_policy` property is set to bit mask
"Require color accuracy" ABM should be disabled immediately and
any requests by sysfs to update will return an -EBUSY error.
When the `power_saving_policy` property is set to bit mask
"Requ
On 2024-05-22 18:05, Mario Limonciello wrote:
The `power saving policy` DRM property is an optional property that
can be added to a connector by a driver.
This property is for compositors to indicate intent of policy of
whether a driver can use power saving features that may compromise
the ex
[AMD Official Use Only - AMD Internal Distribution Only]
Review-by: Emily Deng
>-Original Message-
>From: Li, Yunxiang (Teddy)
>Sent: Friday, May 31, 2024 5:48 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deucher, Alexander ; Koenig, Christian
>; Li, Yunxiang (Teddy) ;
>Chang, HaiJun ; De
-20240604]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url:
https://github.com/intel-lab-lkp/linux/commits/Christian-K-nig/d
If another thread accesses the gpu while the GPU is being reset, the
reset could fail. This is especially problematic on SRIOV since host
may reset the GPU even if guest is not yet ready.
There are code in place that tries to prevent stray access, but over
time bugs have crept in making it not rel
Accessing registers via host is missing the check for skip_hw_access and
the lockdep check that comes with it.
Signed-off-by: Yunxiang Li
Reviewed-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 9 +
1 file changed, 9 insertions(+)
diff --git a/drivers/gpu/drm/amd/am
At this point the gart is not set up, there's no point to invalidate tlb
here and it could even be harmful.
Signed-off-by: Yunxiang Li
Reviewed-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amd
We send back the ready to reset message before we stop anything. This is
wrong. Move it to when we are actually ready for the FLR to happen.
In the current state since we take tens of seconds to stop everything,
it is very likely that host would give up waiting and reset the GPU
before we send rea
We need to take the reset domain lock before talking to MES. While in
this case we can take the lock inside the mes helper. We can't do so for
most other mes helpers since they are used during reset. So for
consistency sake we add the lock here.
Signed-off-by: Yunxiang Li
---
drivers/gpu/drm/amd
Here since we are in reset and takes the reset_domain write side lock
already. We can't use the flush tlb helper which tries to take the read
side.
Signed-off-by: Yunxiang Li
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 4 +---
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 2 +-
drivers/gpu/drm/amd
When amdgpu_gart_invalidate_tlb helper is introduced this part was left
out of the conversion. Avoid the code duplication here.
Signed-off-by: Yunxiang Li
Reviewed-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 5 +
1 file changed, 1 insertion(+), 4 deletions(-)
diff --
is_hws_hang and is_resetting serves pretty much the same purpose and
they all duplicates the work of the reset_domain lock, just check that
directly instead. This also eliminate a few bugs listed below and get
rid of dqm->ops.pre_reset.
kfd_hws_hang did not need to avoid scheduling another reset.
We need to take the reset domain lock before flush hdp. We can't put the
lock inside amdgpu_device_flush_hdp itself because it is used during
reset where we already take the write side lock.
Signed-off-by: Yunxiang Li
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 6 +-
1 file changed, 5 ins
Which method is used to flush tlb does not depend on whether a reset is
in progress or not. We should skip flush altogether if the GPU will get
reset. So put both path under reset_domain read lock.
Signed-off-by: Yunxiang Li
Reviewed-by: Christian König
CC: sta...@vger.kernel.org
---
drivers/gp
[AMD Official Use Only - AMD Internal Distribution Only]
HI Felix
-Original Message-
From: Kuehling, Felix
Sent: Wednesday, June 5, 2024 2:45 AM
To: Zhang, Jesse(Jie) ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Koenig, Christian
; Huang, Tim
Subject: Re: [PATCH 10/12] drm/
[AMD Official Use Only - AMD Internal Distribution Only]
Series is
Reviewed-by: Hawking Zhang
Regards,
Hawking
-Original Message-
From: amd-gfx On Behalf Of Alex Deucher
Sent: Saturday, June 1, 2024 00:08
To: amd-gfx@lists.freedesktop.org
Cc: Min, Frank ; Gao, Likun ; Deucher,
Alexan
[AMD Official Use Only - AMD Internal Distribution Only]
Ping for the series...
> -Original Message-
> From: Zhou1, Tao
> Sent: Friday, May 31, 2024 6:49 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhou1, Tao
> Subject: [PATCH 1/5] drm/amdgpu: add RAS is_rma flag
>
> Set the flag to tr
[AMD Official Use Only - AMD Internal Distribution Only]
Please correct the commit subject before pushing the change
drma->drm
Regards,
Hawking
-Original Message-
From: amd-gfx On Behalf Of Tao Zhou
Sent: Friday, May 31, 2024 18:49
To: amd-gfx@lists.freedesktop.org
Cc: Zhou1, Tao
Subje
[AMD Official Use Only - AMD Internal Distribution Only]
Reviewed-by: Hawking Zhang
Regards,
Hawking
-Original Message-
From: amd-gfx On Behalf Of Tao Zhou
Sent: Friday, May 31, 2024 18:49
To: amd-gfx@lists.freedesktop.org
Cc: Zhou1, Tao
Subject: [PATCH 3/5] drm/amdgpu: create amdgpu_r
63 matches
Mail list logo