What about removing
(kthread_should_park()) ? We decided it's useless as far as I remember.
Andrey
From: amd-gfx on behalf of Liu, Monk
Sent: 31 August 2021 20:24
To: Liu, Monk ; amd-gfx@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org
Subject: RE:
AFAIK this one is independent.
Christian, can you confirm ?
Andrey
From: amd-gfx on behalf of Alex Deucher
Sent: 14 September 2021 15:33
To: Christian König
Cc: Liu, Monk ; amd-gfx list ;
Maling list - DRI developers
Subject: Re: [PATCH 1/2] drm/sched: fix t
Just a gentle ping.
Andrey
From: Grodzovsky, Andrey
Sent: 26 January 2022 10:52
To: Christian König ; Koenig, Christian
; Lazar, Lijo ;
dri-de...@lists.freedesktop.org ;
amd-gfx@lists.freedesktop.org ; Chen, JingWen
Cc: Chen, Horace ; Liu, Monk
Subject: Re
21:41
To: Grodzovsky, Andrey ; Christian König
; Koenig, Christian
; Lazar, Lijo ;
dri-de...@lists.freedesktop.org ;
amd-gfx@lists.freedesktop.org ; Chen, JingWen
Cc: Chen, Horace ; Liu, Monk
Subject: Re: [RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with TDRs
Hi Andrey,
I don
Is libdrm on gitlab ? I wasn't aware of this. I assumed code reviews still go
through dri-devel.
Andrey
From: Alex Deucher
Sent: 03 June 2021 17:20
To: Grodzovsky, Andrey
Cc: Maling list - DRI developers ; amd-gfx
list ; Deucher, Alexander
; Christian
Ping
Andrey
From: amd-gfx on behalf of Andrey
Grodzovsky
Sent: 27 August 2020 10:54
To: Alex Deucher
Cc: Deucher, Alexander ; Das, Nirmoy
; amd-gfx list
Subject: Re: [PATCH 5/7] drm/amdgpu: Fix consecutive DPC recoveries failure.
On 8/26/20 11:20 AM, Alex D
It's based on v5.9-rc2 but won't apply cleanly since there is a significant
amount of amd-staging-drm-next patches which this was applied on top of.
Andrey
From: Bjorn Helgaas
Sent: 02 September 2020 17:36
To: Grodzovsky, Andrey
C
Reviewed-by Andrey Grodzovsky
Andrey
From: amd-gfx on behalf of Emily.Deng
Sent: 07 October 2020 21:35
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily
Subject: [PATCH] drm/amdgpu: Remove warning for virtual_display
Remove the virtual_display warning in
Ping for both patches.
Andrey
From: Andrey Grodzovsky
Sent: 14 October 2020 13:24
To: amd-gfx@lists.freedesktop.org
Cc: Kazlauskas, Nicholas ; Wentland, Harry
; Pan, Xinhui ; Grodzovsky, Andrey
Subject: [PATCH 2/2] drm/amd/display: Avoid MST manager resource
with 1 and MAX_INT.
Andrey
From: Zhang, Jack (Jian)
Sent: 10 March 2021 22:05
To: Grodzovsky, Andrey ;
amd-gfx@lists.freedesktop.org ; Koenig,
Christian ; Liu, Monk ; Deng, Emily
Subject: RE: [PATCH v6] drm/amd/amdgpu implement tdr advanced mode
[AMD Official
Hey Daniel, just a ping on a bunch of questions i posted bellow.
Andtey
From: Grodzovsky, Andrey
Sent: 25 November 2020 14:34
To: Daniel Vetter ; Koenig, Christian
Cc: r...@kernel.org ; daniel.vet...@ffwll.ch
; dri-de...@lists.freedesktop.org
; e
Hey, just a ping on my comments/question bellow.
Andrey
From: Grodzovsky, Andrey
Sent: 25 November 2020 12:39
To: Daniel Vetter
Cc: amd-gfx list ; dri-devel
; Christian König
; Rob Herring ; Lucas Stach
; Qiang Yu ; Anholt, Eric
; Pekka Paalanen ; Deucher
Ok then, I guess I will proceed with the dummy pages list implementation then.
Andrey
From: Koenig, Christian
Sent: 08 January 2021 09:52
To: Grodzovsky, Andrey ; Daniel Vetter
Cc: amd-gfx@lists.freedesktop.org ;
dri-de...@lists.freedesktop.org ;
daniel.vet
On 10/16/19 11:55 PM, Quan, Evan wrote:
> This is a quick and low risk fix. Those APIs which
> are exposed to other IPs or to support sysfs/hwmon
> interfaces or DAL will have lock protection. Meanwhile
> no lock protection is enforced for swSMU internal used
> APIs. Future optimization is needed.
He Felix - I see this on boot when working with Arcturus.
Andrey
[ 103.602092] kfd kfd: Allocated 3969056 bytes on gart
[ 103.610769]
==
[ 103.611469] BUG: KASAN: stack-out-of-bounds in
kfd_create_vcrat_image_gpu+0x5db/0xb80 [a
ed
> here hasn't changed recently.
>
> Are you using some weird kernel config with a smaller stack? Is it
> specific to a compiler version or some optimization flags? I've
> sometimes seen function inlining cause excessive stack usage.
>
> Regards,
> Felix
>
>
On 10/18/19 1:00 AM, Quan, Evan wrote:
>
> -Original Message-
> From: Grodzovsky, Andrey
> Sent: Thursday, October 17, 2019 10:22 PM
> To: Quan, Evan ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amd/powerplay: add lock protection for swSMU APIs
>
>
s.freedesktop.org
> Cc: Chen, Guchun ; Zhou1, Tao ;
> Deucher, Alexander ; noreply-conflue...@amd.com;
> Quan, Evan ; Grodzovsky, Andrey
> Subject: [PATCH 4/4] drm/amdgpu: Move amdgpu_ras_recovery_init to after SMU
> ready.
>
> For Arcturus the I2C traffic is done through
Deucher, Alexander ; noreply-conflue...@amd.com;
> Quan, Evan ; Grodzovsky, Andrey
> Subject: [PATCH 2/4] drm/amd/powerplay: Add EEPROM I2C read/write support to
> Arcturus.
>
> The communication is done through SMU table and hence the code is in
> powerplay.
>
> Sig
Friday, October 18, 2019 4:55 PM
> To: Grodzovsky, Andrey
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: Stack out of bounds in KFD on Arcturus
>
> On 2019-10-17 6:38 p.m., Grodzovsky, Andrey wrote:
>> Not that I aware of, is there a special Kconfig flag to determine
>&g
I don't know - what Kconfig flag should I look at ?
Andrey
On 10/22/19 1:17 PM, Zeng, Oak wrote:
> Sorry I meant is the kernel stack size 16KB in your kconfig?
>
> Oak
>
> -Original Message-
> From: Grodzovsky, Andrey
> Sent: Tuesday, October 22, 2019 12:49 PM
t to know whether
> this is mi100 specific issue.
>
> Oak
>
> -----Original Message-
> From: Grodzovsky, Andrey
> Sent: Tuesday, October 22, 2019 1:28 PM
> To: Zeng, Oak ; Kuehling, Felix
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: Stack out of bounds in KFD o
On 10/22/19 2:28 PM, Yang, Philip wrote:
> If device reset/suspend/resume failed for some reason, dqm lock is
> hold forever and this causes deadlock. Below is a kernel backtrace when
> application open kfd after suspend/resume failed.
>
> Instead of holding dqm lock in pre_reset and releasing dqm
On 10/22/19 2:38 PM, Grodzovsky, Andrey wrote:
> On 10/22/19 2:28 PM, Yang, Philip wrote:
>> If device reset/suspend/resume failed for some reason, dqm lock is
>> hold forever and this causes deadlock. Below is a kernel backtrace when
>> application open kfd after
019 4:48 AM
>> To: amd-gfx@lists.freedesktop.org
>> Cc: Chen, Guchun ; Zhou1, Tao
>> ; Deucher, Alexander ;
>> noreply-conflue...@amd.com; Quan, Evan ;
>> Grodzovsky, Andrey
>> Subject: [PATCH 2/4] drm/amd/powerplay: Add EEPROM I2C read/write
>> support
On 10/22/19 3:19 PM, Yang, Philip wrote:
>
> On 2019-10-22 2:40 p.m., Grodzovsky, Andrey wrote:
>> On 10/22/19 2:38 PM, Grodzovsky, Andrey wrote:
>>> On 10/22/19 2:28 PM, Yang, Philip wrote:
>>>> If device reset/suspend/resume failed for some reason, dqm lock is
On 10/22/19 4:04 PM, Yang, Philip wrote:
>
> On 2019-10-22 3:36 p.m., Grodzovsky, Andrey wrote:
>> On 10/22/19 3:19 PM, Yang, Philip wrote:
>>> On 2019-10-22 2:40 p.m., Grodzovsky, Andrey wrote:
>>>> On 10/22/19 2:38 PM, Grodzovsky, Andrey wrote:
>>>
On 10/24/19 7:01 AM, Christian König wrote:
Am 24.10.19 um 12:58 schrieb S, Shirish:
[Why]
Upon GPU reset, kernel cleans up already submitted jobs
via drm_sched_cleanup_jobs.
This schedules ib's via drm_sched_main()->run_job, leading to
race condition of rings being ready or not, since during rese
On 10/25/19 4:44 AM, Christian König wrote:
> Am 24.10.19 um 21:57 schrieb Andrey Grodzovsky:
>> Problem:
>> When run_job fails and HW fence returned is NULL we still signal
>> the s_fence to avoid hangs but the user has no way of knowing if
>> the actual HW job was ran and finished.
>>
>> Fix:
>>
if (!ring->sched.ready) {
+ dump_stack();
dev_err(adev->dev, "couldn't schedule ib on ring <%s>\n",
ring->name);
return -EINVAL;
On 10/24/2019 10:00 PM, Christian König wrote:
Am 24.10.19 um 17:06 schrieb Grodzovsky, Andrey:
On 10/24/19 7:01
On 10/25/19 11:55 AM, Koenig, Christian wrote:
> Am 25.10.19 um 16:57 schrieb Grodzovsky, Andrey:
>> On 10/25/19 4:44 AM, Christian König wrote:
>>> Am 24.10.19 um 21:57 schrieb Andrey Grodzovsky:
>>>> Problem:
>>>> When run_job fails and HW fence returne
On 10/25/19 11:57 AM, Koenig, Christian wrote:
Am 25.10.19 um 17:35 schrieb Grodzovsky, Andrey:
On 10/25/19 5:26 AM, Koenig, Christian wrote:
Am 25.10.19 um 11:22 schrieb S, Shirish:
On 10/25/2019 2:23 PM, Koenig, Christian wrote:
amdgpu_do_asic_reset starting to resume blocks
...
amdgpu
On 10/29/19 2:03 PM, Dan Carpenter wrote:
> On Tue, Oct 29, 2019 at 11:04:44AM -0400, Andrey Grodzovsky wrote:
>> Fix a static code checker warning.
>>
>> Signed-off-by: Andrey Grodzovsky
>> ---
>> drivers/gpu/drm/scheduler/sched_main.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions
That good as proof of RCA but I still think we should grab a dedicated
lock inside scheduler since the race is internal to scheduler code so
this better to handle it inside the scheduler code to make the fix apply
for all drivers using it.
Andrey
On 10/30/19 4:44 AM, S, Shirish wrote:
>>
On 10/30/19 6:22 AM, S, Shirish wrote:
> On 10/30/2019 3:50 PM, Koenig, Christian wrote:
>> Am 30.10.19 um 10:13 schrieb S, Shirish:
>>> [Why]
>>>
>>> doing kthread_park()/unpark() from drm_sched_entity_fini
>>> while GPU reset is in progress defeats all the purpose of
>>> drm_sched_stop->kthread_
) hack in
> drm_sched_entity_fini().
>
> We could do this with a struct completion or convert the scheduler
> from a thread to a work item.
>
> Regards,
> Christian.
>
> Am 30.10.19 um 15:44 schrieb Grodzovsky, Andrey:
>> That good as proof of RCA but I still think we should
taking all those locks
> in the right order.
>
> Christian.
>
> Am 30.10.19 um 15:56 schrieb Grodzovsky, Andrey:
>> Can you elaborate on what is the tricky part with the lock ? I assumed
>> we just use per scheduler lock.
>>
>> Andrey
>>
>> On 10/
Reviewed-by: Andrey Grodzovsky
Andrey
On 10/30/19 6:20 AM, Koenig, Christian wrote:
> Am 30.10.19 um 10:13 schrieb S, Shirish:
>> [Why]
>>
>> doing kthread_park()/unpark() from drm_sched_entity_fini
>> while GPU reset is in progress defeats all the purpose of
>
On 11/8/19 5:35 AM, Koenig, Christian wrote:
> Hi Emily,
>
> exactly that can't happen. See here:
>
>> /* Don't destroy jobs while the timeout worker is running */
>> if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>> !cancel_delayed_work(&sched->work_tdr))
>>
On 11/8/19 5:54 AM, Deng, Emily wrote:
> Hi Christian,
> Sorry, seems I understand wrong. And from the print, the free job's
> thread is the same as job timeout thread. So seems have some issue in
> function amdgpu_device_gpu_recover.
I don't think it's correct, seems your prints just don
Thinking more about this claim - we assume here that if cancel_delayed_work
returned true it guarantees that timeout work is not running but, it merely
means there was a pending timeout work which was removed from the workqueue
before it's timer elapsed and so it didn't have a chance to be deque
o the issue Emily reported can be
avoided.
Andrey
From: Deng, Emily
Sent: 25 November 2019 16:44:36
To: Grodzovsky, Andrey
Cc: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig,
Christian; steven.pr...@arm.com; Grodzovsky, Andrey
Subjec
[AMD Official Use Only - Internal Distribution Only]
__
From: Christian König
Sent: 12 December 2019 03:31
To: Alex Deucher; Grodzovsky, Andrey
Cc: Deucher, Alexander; Ma, Le; Quan, Evan; amd-gfx list; Zhang, Hawking
Subject: Re: [PATCH 2/5] drm: Add Reusable
Patches 1-3 Reviewed-by: Andrey Grodzovsky
Patch 4 Acked-by: Andrey Grodzovsky
Andrey
On 10/16/2018 07:55 AM, Christian König wrote:
> Make sure we always restart the timer after a timeout and remove the
> device specific workarounds.
>
> Signed-off-by: Christian König
> ---
Eclipse
Andrey
On 10/19/2018 03:13 AM, Michel Dänzer wrote:
> ... and the new line here.
>
> Which editor are you using?
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
On 10/19/2018 03:08 AM, Koenig, Christian wrote:
> Am 18.10.18 um 20:44 schrieb Andrey Grodzovsky:
>> A ring might become unusable after reset, if that the case
>> drm_sched_entity_select_rq will choose another, working rq
>> to run the job if there is one.
>> Also, skip recovery of ring which is
That my next step.
Andrey
On 10/19/2018 12:28 PM, Christian König wrote:
From my testing looks like we can, compute ring 0 is dead but IB tests
pass on other compute rings.
Interesting, but I would rather investigate why compute ring 0 is dead while
other still work.
On 10/23/2018 05:23 AM, Christian König wrote:
> Am 22.10.18 um 22:46 schrieb Andrey Grodzovsky:
>> Start using drm_gpu_scheduler.ready isntead.
>>
>> v3:
>> Add helper function to run ring test and set
>> sched.ready flag status accordingly, clean explicit
>> sched.ready sets from the IP specifi
On 10/22/2018 05:33 AM, Koenig, Christian wrote:
> Am 19.10.18 um 22:52 schrieb Andrey Grodzovsky:
>> Problem:
>> A particular scheduler may become unsuable (underlying HW) after
>> some event (e.g. GPU reset). If it's later chosen by
>> the get free sched. policy a command will fail to be
>> sub
On 10/26/2018 04:05 AM, Christian König wrote:
> Am 25.10.18 um 22:16 schrieb Andrey Grodzovsky:
>> Problem: After GPU reset on dGPUs with gfx8 compute ring
>> 1.0.0 fails to pass the ring test. Ring registers inspection
>> shows that it's active and no hang is observed (rptr == wptr)
>> No signi
Reviewed-by: Andrey Grodzovsky
Andrey
On 10/29/2018 11:28 AM, Christian König wrote:
> We already print an error message that an IB test failed in the common
> code.
>
> Signed-off-by: Christian König
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 18 +++
>
Typo, series is Reviewed-by: Andrey Grodzovsky
Andrey
On 10/29/2018 12:18 PM, Grodzovsky, Andrey wrote:
> Reviewed-by: Andrey Grodzovsky
>
> Andrey
>
>
> On 10/29/2018 11:28 AM, Christian König wrote:
>> We already print an error message that an IB test fail
On 10/31/2018 03:49 PM, Alex Deucher wrote:
> On Wed, Oct 31, 2018 at 2:33 PM Andrey Grodzovsky
> wrote:
>> Illegal access will cause CP hang followed by job timeout and
>> recovery kicking in.
>> Also, disable the suite for all APU ASICs until GPU
>> reset issues for them will be resolved and G
On 10/31/2018 03:49 PM, Alex Deucher wrote:
> On Wed, Oct 31, 2018 at 2:33 PM Andrey Grodzovsky
> wrote:
>> Illegal access will cause CP hang followed by job timeout and
>> recovery kicking in.
>> Also, disable the suite for all APU ASICs until GPU
>> reset issues for them will be resolved and G
On 11/02/2018 10:24 AM, Michel Dänzer wrote:
> On 2018-10-31 7:33 p.m., Andrey Grodzovsky wrote:
>> Illegal access will cause CP hang followed by job timeout and
>> recovery kicking in.
>> Also, disable the suite for all APU ASICs until GPU
>> reset issues for them will be resolved and GPU reset
On 11/02/2018 02:12 PM, Alex Deucher wrote:
> On Fri, Nov 2, 2018 at 11:59 AM Grodzovsky, Andrey
> wrote:
>>
>>
>> On 11/02/2018 10:24 AM, Michel Dänzer wrote:
>>> On 2018-10-31 7:33 p.m., Andrey Grodzovsky wrote:
>>>> Illegal access will cause CP h
There is a pplib messaging related failure currently during GPU reset. I will
put this issue on my TODO
list for later time after handling more prioritized stuff and will disable the
deadlock test suite for all non dGPU gfx8/9 ASICs until then.
Andrey
On 11/02/2018 02:14 PM, Grodzovsky
Reviewed-by: Andrey Grodzovsky
Question - shouldn't we also set psp_xgmi_node_info.is_sharing_enabled
to 1 to enable FB sharing ?
Andrey
On 11/08/2018 11:14 AM, Liu, Shaoyun wrote:
> From: shaoyunl
>
> Driver need to call each psp instance to get topology info before set topology
>
> Change-
On 11/21/2018 02:29 PM, Alex Deucher wrote:
> On Wed, Nov 21, 2018 at 1:11 PM Andrey Grodzovsky
> wrote:
>> This is prep work for updating each PSP FW in hive after
>> GPU reset.
>> Split into build topology SW state and update each PSP FW in the hive.
>> Save topology and count of XGMI devices
Depends what was the reason for triggering the reset for that node how
do we know ?
If the reason was RAS error that probably not hard to check all errors
are cleared, but
if the reason was job timeout on that specific node I will need to
recheck that no jobs are left in incomplete state
state.
verything together again and start the
> scheduler to go on with job submission.
>
> Christian.
>
> Am 21.11.18 um 23:02 schrieb Grodzovsky, Andrey:
>> Depends what was the reason for triggering the reset for that node how
>> do we know ?
>> If the reason was RAS erro
On 11/22/2018 12:34 PM, Nicholas Kazlauskas wrote:
> [Why]
> Atomic check can't truly be non-blocking if amdgpu_dm is waiting for
> hw_done and flip_done in atomic check. This introduces waits when
> any previous non-blocking commits queued work on a worker thread and
> a new atomic commit attemp
On 11/22/2018 02:43 PM, Kazlauskas, Nicholas wrote:
> On 11/22/18 2:39 PM, Grodzovsky, Andrey wrote:
>>
>> On 11/22/2018 12:34 PM, Nicholas Kazlauskas wrote:
>>> [Why]
>>> Atomic check can't truly be non-blocking if amdgpu_dm is waiting for
>>&
On 11/22/2018 02:03 PM, Christian König wrote:
> Am 22.11.18 um 16:44 schrieb Grodzovsky, Andrey:
>>
>> On 11/22/2018 06:16 AM, Christian König wrote:
>>> How about using a lock per hive and then acquiring that with trylock()
>>> instead?
>>>
>>
t and do it for each driver in between
scheduler deactivation and activation back ?
Andrey
On 11/22/2018 02:56 PM, Grodzovsky, Andrey wrote:
Additional to that I would try improve the pre, middle, post handling
towards checking if we made some progress in between.
In other words we stop all sc
Ping...
Andrey
On 11/27/2018 01:37 PM, Andrey Grodzovsky wrote:
> This set of patches adds support to reset entire XGMI hive
> when reset is required.
>
> Patches 1-2 refactoring a bit the XGMI infrastructure as
> preparaton for the actual hive reset change.
>
> Patch 5 is GPU reset/recovery ref
On 11/30/2018 04:03 AM, Christian König wrote:
> Am 29.11.18 um 21:36 schrieb Andrey Grodzovsky:
>> XGMI hive has some resources allocted on device init which
>> needs to be deallocated when the device is unregistered.
>>
>> Add per hive wq to allow all the nodes in hive to run resets
>> concuren
On 11/30/2018 10:53 AM, Koenig, Christian wrote:
> Am 30.11.18 um 16:14 schrieb Grodzovsky, Andrey:
>> On 11/30/2018 04:03 AM, Christian König wrote:
>>> Am 29.11.18 um 21:36 schrieb Andrey Grodzovsky:
>>>> XGMI hive has some resources allocted on device init whic
On 11/30/2018 02:49 PM, Alex Deucher wrote:
> On Fri, Nov 30, 2018 at 1:17 PM Andrey Grodzovsky
> wrote:
>> XGMI hive has some resources allocted on device init which
>> needs to be deallocated when the device is unregistered.
>>
>> v2: Remove creation of dedicated wq for XGMI hive reset.
>>
>>
On 11/30/2018 03:08 PM, Alex Deucher wrote:
> On Fri, Nov 30, 2018 at 3:06 PM Grodzovsky, Andrey
> wrote:
>>
>>
>> On 11/30/2018 02:49 PM, Alex Deucher wrote:
>>> On Fri, Nov 30, 2018 at 1:17 PM Andrey Grodzovsky
>>> wrote:
>>>> XGMI
On 11/30/2018 03:30 PM, Alex Deucher wrote:
> Use this to track whether an asic supports xgmi rather than
> checking the asic type everywhere.
>
> Signed-off-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4 ++--
> driver
Reviewed-by: Andrey Grodzovsky
Andrey
On 11/30/2018 03:36 PM, Alex Deucher wrote:
> On Fri, Nov 30, 2018 at 3:34 PM Grodzovsky, Andrey
> wrote:
>>
>>
>> On 11/30/2018 03:30 PM, Alex Deucher wrote:
>>> Use this to track whether an asic supports xgmi rathe
On 12/05/2018 02:59 PM, Nicholas Kazlauskas wrote:
> [Why]
> Legacy cursor plane updates from drm helpers go through the full
> atomic codepath. A high volume of cursor updates through this slow
> code path can cause subsequent page-flips to skip vblank intervals
> since each individual update is
On 12/05/2018 03:42 PM, Kazlauskas, Nicholas wrote:
> On 2018-12-05 3:26 p.m., Grodzovsky, Andrey wrote:
>>
>> On 12/05/2018 02:59 PM, Nicholas Kazlauskas wrote:
>>> [Why]
>>> Legacy cursor plane updates from drm helpers go through the full
>>> atomic
Not an expert on Freesync so maybe stupid question but from he comment
looks like this pipe locking is only for the sake of Freesync mode there
- why is it then called unconditionally w/o checking if you even run in
Freesync mode ?
Andrey
On 12/06/2018 08:42 AM, Kazlauskas, Nicholas wrote:
>
Ok - the change is Acked-by: Andrey Grodzovsky
Andrey
On 12/06/2018 10:59 AM, Nicholas Kazlauskas wrote:
> On 2018-12-06 10:36 a.m., Grodzovsky, Andrey wrote:
>> Not an expert on Freesync so maybe stupid question but from he comment
>> looks like this pipe locking is only
On 12/06/2018 12:41 PM, Andrey Grodzovsky wrote:
> Expedite job deletion from ring mirror list to the HW fence signal
> callback instead from finish_work, together with waiting for all
> such fences to signal in drm_sched_stop we garantee that
> already signaled job will not be processed twice.
>
On 12/06/2018 01:33 PM, Christian König wrote:
> Am 06.12.18 um 18:41 schrieb Andrey Grodzovsky:
>> Decauple sched threads stop and start and ring mirror
>> list handling from the policy of what to do about the
>> guilty jobs.
>> When stoppping the sched thread and detaching sched fences
>> from
On 12/07/2018 03:19 AM, Christian König wrote:
> Am 07.12.18 um 04:18 schrieb Zhou, David(ChunMing):
>>
>>> -Original Message-
>>> From: dri-devel On Behalf Of
>>> Andrey Grodzovsky
>>> Sent: Friday, December 07, 2018 1:41 AM
>>> To: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedes
Acked-by: Andrey Grodzovsky
Andrey
On 12/10/2018 04:29 PM, Kuehling, Felix wrote:
> This function was renamed in a previous commit. Update the stub
> function name for builds with CONFIG_HSA_AMD disabled.
>
> Fixes: 62f65d3cb34a ("drm/amdgpu: Add KFD VRAM limit checking&
dzovsky
>> Sent: Tuesday, December 11, 2018 5:44 AM
>> To: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org;
>> ckoenig.leichtzumer...@gmail.com; e...@anholt.net;
>> etna...@lists.freedesktop.org
>> Cc: Zhou, David(ChunMing) ; Liu, Monk
>> ; Grodzovsky, Andrey
&
np
Andrey
On 12/11/2018 03:18 PM, Alex Deucher wrote:
> On Tue, Dec 11, 2018 at 3:13 PM Andrey Grodzovsky
> wrote:
>> I retested GPU recovery with Bonaire ASIC and it works.
>>
>> Signed-off-by: Andrey Grodzovsky
> Reviewed-by: Alex Deucher
>
> Care to enable it in the kernel as well?
>
> Al
ote:
> Yeah, completely correct explained.
>
> I was unfortunately really busy today, but going to give that a look
> as soon as I have time.
>
> Christian.
>
> Am 11.12.18 um 17:01 schrieb Grodzovsky, Andrey:
>> A I understand you say that by the time the fence callback r
Just a reminder. Any new comments in light of all the discussion ?
Andrey
On 12/12/2018 08:08 AM, Grodzovsky, Andrey wrote:
> BTW, the problem I pointed out with drm_sched_entity_kill_jobs_cb is not
> an issue with this patch set since it removes the cb from
> s_fence->finished in g
On 12/14/2018 12:26 PM, Nicholas Kazlauskas wrote:
> [Why]
> The behavior of drm_atomic_helper_cleanup_planes differs depending on
> whether the commit was asynchronous or not. When it's called from
> amdgpu_dm_atomic_commit_tail during a typical atomic commit the
> plane state has been swapped s
On 12/14/2018 12:41 PM, Kazlauskas, Nicholas wrote:
> On 12/14/18 12:34 PM, Grodzovsky, Andrey wrote:
>>
>> On 12/14/2018 12:26 PM, Nicholas Kazlauskas wrote:
>>> [Why]
>>> The behavior of drm_atomic_helper_cleanup_planes differs depending on
>>> whethe
In general I agree with Michel that DRM solution is required to
properly address this but since now it's not really obvious what is the
proper solution it seems to me OK to go with this fix until it's found.
Reviewed-by: Andrey Grodzovsky
Andrey
On 12/14/2018 12:51 PM, Kazlauskas
With this change in latest drm-next and related commit in latest FW i get
[ 148.887374] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed
[ 148.887535] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP
block failed -22
Had to revert to be able to boot.
Andrey
On
On 12/14/2018 02:17 PM, Kazlauskas, Nicholas wrote:
> On 12/14/18 2:06 PM, Grodzovsky, Andrey wrote:
>> In general I agree with Michel that DRM solution is required to
>> properly address this but since now it's not really obvious what is the
>> proper solution it seem
On 12/17/2018 04:53 AM, Michel Dänzer wrote:
> On 2018-12-15 6:25 a.m., Grodzovsky, Andrey wrote:
>> On 12/14/2018 02:17 PM, Kazlauskas, Nicholas wrote:
>>> On 12/14/18 2:06 PM, Grodzovsky, Andrey wrote:
>>>> In general I agree with Michel that DRM solution is req
On 12/17/2018 10:27 AM, Christian König wrote:
> Am 10.12.18 um 22:43 schrieb Andrey Grodzovsky:
>> Decauple sched threads stop and start and ring mirror
>> list handling from the policy of what to do about the
>> guilty jobs.
>> When stoppping the sched thread and detaching sched fences
>> from
On 12/17/2018 01:51 PM, Wentland, Harry wrote:
> On 2018-12-15 4:42 a.m., Mikhail Gavrilov wrote:
>> On Sat, 15 Dec 2018 at 00:36, Wentland, Harry wrote:
>>> Looks like there's an error before this happens that might get us into this
>>> mess:
>>>
>>> [ 229.741741] [drm:amdgpu_job_timedout [am
On 12/18/2018 10:26 AM, sunpeng...@amd.com wrote:
> From: Leo Li
>
> drm_atomic_helper_check_planes() calls the crtc atomic check helpers. In
> an attempt to better align with the DRM framework, we can move the
> entire dm_update dance to the crtc check helper (since it essentially
> checks that
On 12/18/2018 12:09 PM, Kazlauskas, Nicholas wrote:
> On 12/18/18 10:26 AM, sunpeng...@amd.com wrote:
>> From: Leo Li
>>
>> drm_atomic_helper_check_planes() calls the crtc atomic check helpers. In
>> an attempt to better align with the DRM framework, we can move the
>> entire dm_update dance to
On 12/19/2018 08:54 AM, Kazlauskas, Nicholas wrote:
> On 12/18/18 3:12 PM, Grodzovsky, Andrey wrote:
>>
>> On 12/18/2018 10:26 AM, sunpeng...@amd.com wrote:
>>> From: Leo Li
>>>
>>> drm_atomic_helper_check_planes() calls the crtc atomic check helpers.
On 12/19/2018 11:21 AM, Christian König wrote:
> Am 17.12.18 um 20:51 schrieb Andrey Grodzovsky:
>> Decauple sched threads stop and start and ring mirror
>> list handling from the policy of what to do about the
>> guilty jobs.
>> When stoppping the sched thread and detaching sched fences
>> from
+Tom
Andrey
On 12/19/2018 01:35 PM, Mikhail Gavrilov wrote:
> On Tue, 18 Dec 2018 at 00:08, Grodzovsky, Andrey
> wrote:
>> Please install UMR and dump gfx ring content and waves after the hang is
>> happening.
>>
>> UMR at - https://cgit.freedesktop.org/amd/umr
I believe this issue would be resolved by my pending in review patch
set, specifically 'drm/sched: Refactor ring mirror list handling.' since
already on the first TO handler it will go over all the rings including
the second timed out ring and will remove all call backs including the
bad job c
On 12/21/2018 01:37 PM, Christian König wrote:
> Am 20.12.18 um 20:23 schrieb Andrey Grodzovsky:
>> Decauple sched threads stop and start and ring mirror
>> list handling from the policy of what to do about the
>> guilty jobs.
>> When stoppping the sched thread and detaching sched fences
>> from
HW fence processing'.
> Now there was still much Call-Trace in new osdb triggered in
> dma_fence_set_error. Do you have link for these patches?
> Thanks.
>
> BR,
> Wentao
>
>
> -Original Message-
> From: Grodzovsky, Andrey
> Sent: Saturday, December 22
1 - 100 of 253 matches
Mail list logo