Acked-by: Jingwen Chen
still need confirmation from Christian
On 9/1/22 5:29 PM, ZhenGuo Yin wrote:
> [Why]
> Ghost BO is released with non-empty bulk move object. There is a
> warning trace:
> WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366 ttm_bo_release+0x2e1/0x2f0
> [amdtt
, Christian
>> ; dan...@ffwll.ch
>> *Subject:* Re: [RFC v4 02/11] drm/amdgpu: Move scheduler init to after XGMI
>> is ready
>> No because all the patch-set including this patch was landed into
>> drm-misc-next and will reach amd-staging-drm-next on the next upstream
>&
Hi Andrey,
Will you port this patch into amd-staging-drm-next?
on 2/10/22 2:06 AM, Andrey Grodzovsky wrote:
> All comments are fixed and code pushed. Thanks for everyone
> who helped reviewing.
>
> Andrey
>
> On 2022-02-09 02:53, Christian König wrote:
>> Am 09.02.22 um 01:23 schrieb Andrey Grodz
Hi Andrey,
I have been testing your patch and it seems fine till now.
Best Regards,
Jingwen Chen
On 2022/2/3 上午2:57, Andrey Grodzovsky wrote:
> Just another ping, with Shyun's help I was able to do some smoke testing on
> XGMI SRIOV system (booting and triggering hive reset)
&g
Hi Andrey,
I don't have any XGMI machines here, maybe you can reach out shaoyun for help.
On 2022/1/29 上午12:57, Grodzovsky, Andrey wrote:
> Just a gentle ping.
>
> Andrey
>
Hi Andrey,
Please go ahead and push your change. I will prepare the RFC later.
On 2022/1/8 上午12:02, Andrey Grodzovsky wrote:
>
> On 2022-01-07 12:46 a.m., JingWen Chen wrote:
>> On 2022/1/7 上午11:57, JingWen Chen wrote:
>>> On 2022/1/7 上午3:13, Andrey Grodzovsky wrote:
>>
On 2022/1/7 上午11:57, JingWen Chen wrote:
> On 2022/1/7 上午3:13, Andrey Grodzovsky wrote:
>> On 2022-01-06 12:18 a.m., JingWen Chen wrote:
>>> On 2022/1/6 下午12:59, JingWen Chen wrote:
>>>> On 2022/1/6 上午2:24, Andrey Grodzovsky wrote:
>>>>> On 2022-01-0
On 2022/1/7 上午3:13, Andrey Grodzovsky wrote:
>
> On 2022-01-06 12:18 a.m., JingWen Chen wrote:
>> On 2022/1/6 下午12:59, JingWen Chen wrote:
>>> On 2022/1/6 上午2:24, Andrey Grodzovsky wrote:
>>>> On 2022-01-05 2:59 a.m., Christian König wrote:
>>>&g
On 2022/1/6 下午12:59, JingWen Chen wrote:
> On 2022/1/6 上午2:24, Andrey Grodzovsky wrote:
>> On 2022-01-05 2:59 a.m., Christian König wrote:
>>> Am 05.01.22 um 08:34 schrieb JingWen Chen:
>>>> On 2022/1/5 上午12:56, Andrey Grodzovsky wrote:
>>>>> O
On 2022/1/6 上午2:24, Andrey Grodzovsky wrote:
>
> On 2022-01-05 2:59 a.m., Christian König wrote:
>> Am 05.01.22 um 08:34 schrieb JingWen Chen:
>>> On 2022/1/5 上午12:56, Andrey Grodzovsky wrote:
>>>> On 2022-01-04 6:36 a.m., Christian König wrote:
>>>
is that we need to adjust the implementation in amdgpu to
>>> actually match the requirements.
>>>
>>> Could be that the reset sequence is questionable in general, but I doubt so
>>> at least for now.
>>>
>>> See the FLR request from the hypervis
east for now.
>>
>> See the FLR request from the hypervisor is just another source of signaling
>> the need for a reset, similar to each job timeout on each queue. Otherwise
>> you have a race condition between the hypervisor and the scheduler.
>>
>> Properly
device_unlock_adev in flr_work instead of
try_lock since no one will conflict with this thread with reset_domain
introduced.
But we do need the reset_sem and adev->in_gpu_reset to keep device untouched
via user space.
Best Regards,
Jingwen Chen
On 2022/1/3 下午6:17, Christian König wrote:
I do agree with shaoyun, if the host find the gpu engine hangs first, and do
the flr, guest side thread may not know this and still try to access HW(e.g.
kfd is using a lot of amdgpu_in_reset and reset_sem to identify the reset
status). And this may lead to very bad result.
On 2021/12/24 下午4:58
being deleted from pending list. While if we use the ordered
workqueue for timedout in the driver, there will be no bailing job.
Do you have any suggestions?
Best Regards,
JingWen Chen
On Mon Sep 06, 2021 at 02:36:52PM +0800, Liu, Monk wrote:
> [AMD Official Use Only]
>
> > I'm feari
On Wed Sep 01, 2021 at 12:28:59AM -0400, Andrey Grodzovsky wrote:
>
> On 2021-09-01 12:25 a.m., Jingwen Chen wrote:
> > On Wed Sep 01, 2021 at 12:04:47AM -0400, Andrey Grodzovsky wrote:
> > > I will answer everything here -
> > >
> > > O
On Wed Sep 01, 2021 at 12:04:47AM -0400, Andrey Grodzovsky wrote:
> I will answer everything here -
>
> On 2021-08-31 9:58 p.m., Liu, Monk wrote:
>
>
> [AMD Official Use Only]
>
>
>
> In the previous discussion, you guys stated that we should drop the
> “kthread_should_park”
---
> > Monk Liu | Cloud-GPU Core team
> > --
> >
> > -Original Message-
> > From: Daniel Vetter
> > Sent: Thursday, August 19, 2021 5:31 PM
> > To: Grodzovsky, Andrey
> > Cc: Daniel Vetter ; Alex Deuch
revert this
commit.
This reverts commit 135517d3565b48f4def3b1b82008bc17eb5d1c90.
v2:
add dma_fence_get/put() around timedout_job to avoid concurrent delete
during processing timedout_job
v3:
park sched->thread instead during timedout_job.
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/schedu
19 matches
Mail list logo