To fix issue that ras controller interrupt cannot be triggered anymore after
one time nbif uncorrectable error. And error count is stored in nbif ras object
for query.
Change-Id: Iba482c169fdff3e9c390072c0289a622a522133c
Signed-off-by: Le Ma
---
drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 10 ++
Avoid to change default reset behavior for production card by checking
amdgpu_ras_enable equal to 2. And only new enough smu ucode can support
baco for xgmi/ras case.
Change-Id: I07c3e6862be03e068745c73db8ea71f428ecba6b
Signed-off-by: Le Ma
---
drivers/gpu/drm/amd/amdgpu/soc15.c | 4 +++-
1 file
Change it to external interface.
Change-Id: I2ab61f149c84a05a6f883a4c7415ea8012ec03a6
Signed-off-by: Le Ma
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 5 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 3 +++
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/am
Otherwise next err_event_athub error cannot call gpu reset. And following
resume sequence will not be affected by this flag.
v2: create function to clear amdgpu_ras_in_intr for modularity of ras driver
Change-Id: I5cd293f30f23876bf2a1860681bcb50f47713ecd
Signed-off-by: Le Ma
---
drivers/gpu/drm
This operation is needed when baco entry/exit for ras recovery
Change-Id: I535c7231693f3138a8e3d5acd55672e2ac68232f
Signed-off-by: Le Ma
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 ---
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/amd
From: Le Ma
v2: add notification when ras controller interrupt generates
Change-Id: Ic03e42e9d1c4dab1fa7f4817c191a16e485b48a9
Signed-off-by: Le Ma
---
drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 7 ++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/nbi
Move the print out of uvd instance loop in amdgpu_uvd_suspend
Change-Id: Ifad997debd84763e1b55d668e144b729598f115e
Signed-off-by: Le Ma
---
drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
b/d
Currently each XGMI node reset wq does not run in parrallel because same work
item bound to same cpu runs in sequence. So change to bound the xgmi_reset_work
item to different cpus.
XGMI requires all nodes enter into baco within very close proximity before
any node exit baco. So schedule the xgmi_
This athub fatal error can be recovered by baco without system-level reboot,
so add a mode to use baco for the recovery. Not affect the default psp reset
situations for now.
Change-Id: Ib17f2a39254ff6b0473a785752adfdfea79d0e0d
Signed-off-by: Le Ma
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |
Use the shared address space of the drm device (see drm_open() in
drm_file.c) for dma-bufs too. That removes a difference betweem drm
device mmap vmas and dma-buf mmap vmas and fixes corner cases like
dropping ptes (using madvise(DONTNEED) for example) not working
properly.
Also remove amdgpu dri
[AMD Official Use Only - Internal Distribution Only]
-Original Message-
From: Le Ma
Sent: Wednesday, November 27, 2019 5:15 PM
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Chen, Guchun ;
Zhou1, Tao ; Li, Dennis ; Deucher,
Alexander ; Ma, Le
Subject: [PATCH 10/10] drm/amdg
-Original Message-
From: Chen, Guchun
Sent: Wednesday, November 27, 2019 5:50 PM
To: Ma, Le ; amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Zhou1, Tao ; Li,
Dennis ; Deucher, Alexander ; Ma,
Le
Subject: RE: [PATCH 10/10] drm/amdgpu: reduce redundant uvd context lost
warning me
Move the print out of uvd instance loop in amdgpu_uvd_suspend
v2: drop unnecessary brackets
Change-Id: Ifad997debd84763e1b55d668e144b729598f115e
Signed-off-by: Le Ma
---
drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 10 ++
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/driver
Am 27.11.19 um 11:02 schrieb Le Ma:
Move the print out of uvd instance loop in amdgpu_uvd_suspend
v2: drop unnecessary brackets
Change-Id: Ifad997debd84763e1b55d668e144b729598f115e
Signed-off-by: Le Ma
---
drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 10 ++
1 file changed, 6 insertions
-Original Message-
From: Christian König
Sent: Wednesday, November 27, 2019 6:08 PM
To: Ma, Le ; amd-gfx@lists.freedesktop.org
Cc: Chen, Guchun ; Zhou1, Tao ;
Deucher, Alexander ; Li, Dennis ;
Zhang, Hawking
Subject: Re: [PATCH 10/10 v2] drm/amdgpu: reduce redundant uvd context lost
Move the print out of uvd instance loop in amdgpu_uvd_suspend
v2: drop unnecessary brackets
v3: grab ras_intr state once for multiple times use
Change-Id: Ifad997debd84763e1b55d668e144b729598f115e
Signed-off-by: Le Ma
---
drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 11 +++
1 file changed,
[AMD Public Use]
After thinking it a bit, I think we can just rely on PMFW version to decide to
go RAS recovery or legacy fatal_error handling for the platforms that support
RAS. Leveraging amdgpu_ras_enable as a temporary solution seems not necessary?
Even baco ras recovery not stable, it is t
[AMD Public Use]
And It is still necessary to put all the condition check in a function. I mean
a function that decide to go ras recovery or legacy fatal_error handling. The
PMFW version that support RAS recovery will be different among ASICs. Current
version check only works for VG20. In fact,
[AMD Official Use Only - Internal Distribution Only]
Please check my comments inline
Regards,
Hawking
-Original Message-
From: Le Ma
Sent: 2019年11月27日 17:15
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Chen, Guchun ;
Zhou1, Tao ; Li, Dennis ; Deucher,
Alexander ; Ma, Le
From: Zhang, Hawking
Sent: Wednesday, November 27, 2019 8:04 PM
To: Ma, Le ; amd-gfx@lists.freedesktop.org
Cc: Chen, Guchun ; Zhou1, Tao ; Li,
Dennis ; Deucher, Alexander ; Ma,
Le
Subject: RE: [PATCH 05/10] drm/amdgpu: enable/disable doorbell interrupt in
baco entry/exit helper
Please chec
Agree with your thoughts that we drop amdgpu_ras_enable=2 condition. The only
concern in my side is that besides fatal_error, another result may happen that
atombios_init timeout on xgmi by baco (not sure psp mode1 reset causes this as
well).
Assuming no amdgpu_ras_enable=2 check, if PMFW > 4
Switch to baco reset method for ras recovery if baco-supported PMFW ready.
If not, keep the original reset method.
Change-Id: I07c3e6862be03e068745c73db8ea71f428ecba6b
Signed-off-by: Le Ma
---
drivers/gpu/drm/amd/amdgpu/soc15.c | 18 --
1 file changed, 8 insertions(+), 10 deletio
Hi Hawking,
Please check this v2 patch which is just sent out. And as discussed, we decide
to still leverage the current reset_method() function with functionality/change
scale/code maintainability balanced . Thanks.
Regards,
Ma Le
-Original Message-
From: Zhang, Hawking
Sent: Wednes
Hi Christian,
As you know, we're working on the HMM enablement. Im working on the dGPU page
table entries invalidation on the userptr mapping case. Currently, the MMU
notifiers handle stops all user mode queues, schedule a delayed worker to
re-validate userptr mappings and restart the queues.
Pa
Hi Alejandro,
yes I'm very aware of this issue, but unfortunately can't give an easy
solution either.
I'm working for over a year now on getting this fixed, but unfortunately
it turned out that this problem is much bigger than initially thought.
Setting the appropriate GFP flags for the job
Currently each XGMI node reset wq does not run in parrallel because same work
item bound to same cpu runs in sequence. So change to bound the xgmi_reset_work
item to different cpus.
XGMI requires all nodes enter into baco within very close proximity before
any node exit baco. So schedule the xgmi_
Ping...
Andrey
On 11/26/19 10:36 AM, Andrey Grodzovsky wrote:
On 11/26/19 4:08 AM, Christian König wrote:
Am 25.11.19 um 17:51 schrieb Steven Price:
On 25/11/2019 14:10, Andrey Grodzovsky wrote:
When the sched thread is parked we assume ring_mirror_list is
not accessed from here.
FWIW I don
Am 27.11.19 um 16:32 schrieb Andrey Grodzovsky:
Ping...
Andrey
On 11/26/19 10:36 AM, Andrey Grodzovsky wrote:
On 11/26/19 4:08 AM, Christian König wrote:
Am 25.11.19 um 17:51 schrieb Steven Price:
On 25/11/2019 14:10, Andrey Grodzovsky wrote:
When the sched thread is parked we assume ring_
On 11/27/19 4:15 AM, Le Ma wrote:
Currently each XGMI node reset wq does not run in parrallel because same work
item bound to same cpu runs in sequence. So change to bound the xgmi_reset_work
item to different cpus.
It's not the same work item, see more bellow
XGMI requires all nodes enter
From: Emil Velikov
Current validation requires that we're authenticated, even though we can
bypass (by design) the authentication when using a render node.
Let's address the former by following the design decision.
v2: Add simpler validation in the ioctls themselves (Boris)
Cc: Alex Deucher
C
On Wed, 27 Nov 2019 at 07:41, Boris Brezillon
wrote:
>
> Hi Emil,
>
> On Fri, 1 Nov 2019 13:03:13 +
> Emil Velikov wrote:
>
> > From: Emil Velikov
> >
> > As mentioned by Christian, for drivers which support only primary nodes
> > this changes the returned error from -EACCES into -EOPNOTSUP
On 2019-11-26 4:32 p.m., Zhan Liu wrote:
[Why]
NV14 is using its own ip params that's different from other
DCN2.0 ASICs.
[How]
Add ASIC revision check to make sure NV14 gets correct
ip params.
Signed-off-by: Zhan Liu
Reviewed-by: Nicholas Kazlauskas
---
drivers/gpu/drm/amd/display/dc/dc
On 2019-11-20 12:22 p.m., Colin King wrote:
> From: Colin Ian King
>
> The msg_id field is being assigned twice. Fix this by replacing the second
> assignment with an assignment to msg_size.
>
> Addresses-Coverity: ("Unused value")
> Fixes: 11a00965d261 ("drm/amd/display: Add PSP block to verify
Fixes coccicheck warning:
drivers/gpu/drm/amd/powerplay/hwmgr/vega12_hwmgr.c:502:5-11: Unneeded variable:
"result". Return "0" on line 515
Reported-by: Hulk Robot
Signed-off-by: zhengbin
---
drivers/gpu/drm/amd/powerplay/hwmgr/vega12_hwmgr.c | 4 +---
1 file changed, 1 insertion(+), 3 deletio
Fixes coccicheck warning:
drivers/gpu/drm/amd/powerplay/amdgpu_smu.c:1192:5-8: Unneeded variable: "ret".
Return "0" on line 1195
drivers/gpu/drm/amd/powerplay/amdgpu_smu.c:1945:5-8: Unneeded variable: "ret".
Return "0" on line 1961
Reported-by: Hulk Robot
Signed-off-by: zhengbin
---
drivers/
Fixes coccicheck warning:
drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c:1154:5-11: Unneeded variable:
"result". Return "0" on line 1159
Reported-by: Hulk Robot
Signed-off-by: zhengbin
---
drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c | 3 +--
1 file changed, 1 insertion(+), 2 deletion
Fixes coccicheck warning:
drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c:5188:5-8: Unneeded variable:
"ret". Return "0" on line 5196
Reported-by: Hulk Robot
Signed-off-by: zhengbin
---
drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
Fixes coccicheck warning:
drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c:4363:5-11: Unneeded
variable: "result". Return "0" on line 4370
Reported-by: Hulk Robot
Signed-off-by: zhengbin
---
drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c | 3 +--
1 file changed, 1 insertion(+), 2 deleti
zhengbin (5):
drm/amd/powerplay: Remove unneeded variable 'result' in smu10_hwmgr.c
drm/amd/powerplay: Remove unneeded variable 'result' in vega10_hwmgr.c
drm/amd/powerplay: Remove unneeded variable 'ret' in smu7_hwmgr.c
drm/amd/powerplay: Remove unneeded variable 'result' in vega12_hwmgr.c
On Wed, Nov 27, 2019 at 04:27:29PM +, Emil Velikov wrote:
> On Wed, 27 Nov 2019 at 07:41, Boris Brezillon
> wrote:
> >
> > Hi Emil,
> >
> > On Fri, 1 Nov 2019 13:03:13 +
> > Emil Velikov wrote:
> >
> > > From: Emil Velikov
> > >
> > > As mentioned by Christian, for drivers which support
On Wed, 27 Nov 2019 at 18:04, Daniel Vetter wrote:
>
> On Wed, Nov 27, 2019 at 04:27:29PM +, Emil Velikov wrote:
> > On Wed, 27 Nov 2019 at 07:41, Boris Brezillon
> > wrote:
> > >
> > > Hi Emil,
> > >
> > > On Fri, 1 Nov 2019 13:03:13 +
> > > Emil Velikov wrote:
> > >
> > > > From: Emil
On Wed, Nov 27, 2019 at 06:32:56PM +, Emil Velikov wrote:
> On Wed, 27 Nov 2019 at 18:04, Daniel Vetter wrote:
> >
> > On Wed, Nov 27, 2019 at 04:27:29PM +, Emil Velikov wrote:
> > > On Wed, 27 Nov 2019 at 07:41, Boris Brezillon
> > > wrote:
> > > >
> > > > Hi Emil,
> > > >
> > > > On Fri
On Tue, Nov 26, 2019 at 9:03 PM Luben Tuikov wrote:
>
> Implement an accessor of adev->tmz.enabled. Let not
> code around access it as "if (adev->tmz.enabled)"
> as the organization may change. Instead...
>
> Recruit "bool amdgpu_is_tmz(adev)" to return
> exactly this Boolean value. That is, this
So it's not mixed up with the CTX stuff.
Signed-off-by: Alex Deucher
---
include/uapi/drm/amdgpu_drm.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index f75c6957064d..918ac3548cd3 100644
--- a/include/uap
> -Original Message-
> From: amd-gfx On Behalf Of Alex
> Deucher
> Sent: 2019/November/27, Wednesday 3:57 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander
> Subject: [PATCH] drm/amdgpu: move CS secure flag next the structs where it's
> used
>
> So it's not mixed up with
On 2019-11-27 3:37 p.m., Alex Deucher wrote:
> On Tue, Nov 26, 2019 at 9:03 PM Luben Tuikov wrote:
>>
>> Implement an accessor of adev->tmz.enabled. Let not
>> code around access it as "if (adev->tmz.enabled)"
>> as the organization may change. Instead...
>>
>> Recruit "bool amdgpu_is_tmz(adev)" t
Implement an accessor of adev->tmz.enabled. Let not
code around access it as "if (adev->tmz.enabled)"
as the organization may change. Instead...
Recruit "bool amdgpu_is_tmz(adev)" to return
exactly this Boolean value. That is, this function
is now an accessor of an already initialized and
set adev
Ping
_
Monk Liu|GPU Virtualization Team |AMD
-Original Message-
From: Monk Liu
Sent: Tuesday, November 26, 2019 7:50 PM
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Monk
Subject: [PATCH 1/5] drm/amdgpu: fix GFX10 missing CSIB set
still need to init
ping
_
Monk Liu|GPU Virtualization Team |AMD
-Original Message-
From: Monk Liu
Sent: Tuesday, November 26, 2019 7:50 PM
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Monk
Subject: [PATCH 3/5] drm/amdgpu: do autoload right after MEC loaded for SRIOV VF
_
Monk Liu|GPU Virtualization Team |AMD
-Original Message-
From: Monk Liu
Sent: Tuesday, November 26, 2019 7:50 PM
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Monk
Subject: [PATCH 2/5] drm/amdgpu: skip rlc ucode loading for SRIOV gfx10
Signed-off-b
ping
_
Monk Liu|GPU Virtualization Team |AMD
-Original Message-
From: Monk Liu
Sent: Tuesday, November 26, 2019 7:50 PM
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Monk
Subject: [PATCH 4/5] drm/amdgpu: use CPU to flush vmhub if sched stopped
otherws
ping
_
Monk Liu|GPU Virtualization Team |AMD
-Original Message-
From: Monk Liu
Sent: Tuesday, November 26, 2019 7:50 PM
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Monk
Subject: [PATCH 5/5] drm/amdgpu: fix calltrace during kmd unload
kernel would re
Christian
>> Good catch, but you are somehow messing up the indentation here.
I cannot align with the indentation, because my coding style check script (we
use it to push code to gerritgit) requires me to use "tab" instead of "space"
It means the current coding style is in fact wrong
_
Hi Xiaojie
For SRIOV we don't use suspend so I didn't think to that part, thanks for the
remind !
But we still need to fix this call trace issue anyway (our jenkins testing
system consider such call trace as an error )
How about we do " adev->gfx.rlc.funcs->get_csb_buffer(adev, dst_ptr);" in
[AMD Official Use Only - Internal Distribution Only]
Hi Monk,
As long as the content of CSIB won't be changed by CP FW in runtime, I have no
objection to 're-initialize after S3 resume'.
I am not quite sure about the actual behavior, let me do an experiment to
confirm that and add Hawking / Jac
kernel would report a warning on double unpin
on the csb BO because we unpin it during hw_fini
but actually we don't need to pin/unpin it during
hw_init/fini since it is created with kernel pinned
v2:
get_csb in init_rlc so hw_init() will make CSIB content
back even after reset or s3.
take care of
On Wed, 21 Jun 2017 at 00:03, Marek Olšák wrote:
>
> On Tue, Jun 20, 2017 at 1:46 PM, Christian König
> wrote:
> > Am 20.06.2017 um 12:34 schrieb Marek Olšák:
> >>
> >> BTW, I noticed the flush sequence in the kernel is wrong. The correct
> >> flush sequence should be:
> >>
> >> 1) EVENT_WRITE_EO
[AMD Official Use Only - Internal Distribution Only]
With the v2 version for patch #6, #7 and the fix to enable doorbell int after
BACO exit in Patch #5,
The series is
Reviewed-by: Hawking Zhang
Regards,
Hawking
-Original Message-
From: Le Ma
Sent: 2019年11月27日 17:15
To: amd-gfx@lis
kernel would report a warning on double unpin
on the csb BO because we unpin it during hw_fini
but actually we don't need to pin/unpin it during
hw_init/fini since it is created with kernel pinned
v2:
get_csb in init_rlc so hw_init() will make CSIB content
back even after reset or s3.
take care of
> -Original Message-
> From: Le Ma
> Sent: 2019年11月27日 17:15
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking ; Chen, Guchun
> ; Zhou1, Tao ; Li, Dennis
> ; Deucher, Alexander
> ; Ma, Le
> Subject: [PATCH 05/10] drm/amdgpu: enable/disable doorbell interrupt in
> baco entry/exit
kernel would report a warning on double unpin
on the csb BO because we unpin it during hw_fini
but actually we don't need to pin/unpin it during
hw_init/fini since it is created with kernel pinned
v2:
get_csb in init_rlc so hw_init() will make CSIB content
back even after reset or s3.
take care of
61 matches
Mail list logo