Oops, looks only si will call this dma_stop() during hw_fini, I checked for sdma4.0/3.0/2.4, no such logic
-----Original Message----- From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com] Sent: 2018年3月1日 17:39 To: Liu, Monk <monk....@amd.com>; Koenig, Christian <christian.koe...@amd.com>; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 3/4] drm/amdgpu: don't return when ring not ready for fill_buffer Hi Monk, look at si_dma_stop() and/or other SDMA generation code. Regards, Christian. Am 01.03.2018 um 10:37 schrieb Liu, Monk: > Hi Christian > > During suspend/resume, which stage will set sdma's ring->ready to > false ?? I can only find it in amdgpu_ring_fini() But I saw that it is > only invoked in unload_kms(), looks device_suspnd() will not set > ring->ready to false > > /Monk > > -----Original Message----- > From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf > Of Liu, Monk > Sent: 2018年3月1日 16:52 > To: Koenig, Christian <christian.koe...@amd.com>; > amd-gfx@lists.freedesktop.org > Subject: RE: [PATCH 3/4] drm/amdgpu: don't return when ring not ready > for fill_buffer > >> amdgpu_evict_flags() and amdgpu_bo_move() needs to know if the SDMA ring is >> available during suspend/resume. > As long as we are in gpu reset, we don't return error when ring not ready, > cuz eventually it either success or failed and rescheduled by scheduler since > it is kernel job What's your concern ? > > /Monk > > > -----Original Message----- > From: Koenig, Christian > Sent: 2018年3月1日 16:41 > To: Liu, Monk <monk....@amd.com>; amd-gfx@lists.freedesktop.org > Subject: Re: [PATCH 3/4] drm/amdgpu: don't return when ring not ready > for fill_buffer > >> If we are in gpu rest stage, we can let SDMA job live with the case of ring >> not ready, can you point out where is the problem ? > amdgpu_evict_flags() and amdgpu_bo_move() needs to know if the SDMA ring is > available during suspend/resume. > > E.g. suspend works as this: > 1. Evict all normal BOs from VRAM using the SDMA. > 2. Disable SDMA, GART, PSP etc... e.g. disable the hardware. > 3. Unpin all BOs in VRAM used by hardware engines, e.g. the GART table, > firmware etc... > 4. Evict all remaining BOs from VRAM using CPU copies. > > Resume works the other way around: > 1. Move all BOs for GART table and firmware back into VRAM using CPU copies. > 2. Pin those BOs in VRAM and enable the hardware. > 3. Use the SDMA to move back all other BOs into VRAM. > > To figure out if we should use the CPU copy or the SDMA the ring->ready flag > is used. So removing this check is really a no-go because it would break > suspend/resume. > > The only other alternative I can see is to add a separate flag in amdgpu_mman > if buffer_funcs should be used or not. See all the callers of > amdgpu_ttm_set_active_vram_size() as well. > > Something like renaming amdgpu_ttm_set_active_vram_size() into > amdgpu_ttm_set_buffer_funcs_read() and ignoring any change if gpu_reset is > set. > > BTW: Changing the active VRAM size in GPU reset is quite a bug you stumbled > over here, so this really needs fixing anyway. > > Regards, > Christian. > > Am 01.03.2018 um 09:23 schrieb Liu, Monk: >> Accel_working is only for GFX ring, I don't think test it is appropriate for >> SDMA jobs .... >> >> If we are in gpu rest stage, we can let SDMA job live with the case of ring >> not ready, can you point out where is the problem ? >> >> In fact we did stress TDR test with this patch and it can really fix >> couple over kill issues >> >> >> /Monk >> >> -----Original Message----- >> From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com] >> Sent: 2018年3月1日 16:16 >> To: Liu, Monk <monk....@amd.com>; Koenig, Christian >> <christian.koe...@amd.com>; amd-gfx@lists.freedesktop.org >> Subject: Re: [PATCH 3/4] drm/amdgpu: don't return when ring not ready >> for fill_buffer >> >> Ah, crap yeah that won't work since we don't free the ring. >> >> Key point is we need to distinct between the ring doesn't work temporary >> because we are in a GPU reset and it doesn't work at all because we are >> missing firmware or stuff like that. >> >> And no, checking the gpu_reset flag is totally racy and can't be done >> either. How about checking accel_working instead? >> >> Christian. >> >> Am 01.03.2018 um 07:01 schrieb Liu, Monk: >>>> Please change the test to use ring->ring_obj instead, this way we still >>>> bail out if somebody tries to submit commands before the ring is even >>>> allocated. >>> I don't understand how could fill_buffer() get run under the case that >>> ring->ring_obj is not even allocated ... where is such case ? >>> >>> >>> /Monk >>> >>> -----Original Message----- >>> From: Koenig, Christian >>> Sent: 2018年2月28日 20:46 >>> To: Liu, Monk <monk....@amd.com>; amd-gfx@lists.freedesktop.org >>> Subject: Re: [PATCH 3/4] drm/amdgpu: don't return when ring not >>> ready for fill_buffer >>> >>> Good point, but in this case we need some other handling. >>> >>> Please change the test to use ring->ring_obj instead, this way we still >>> bail out if somebody tries to submit commands before the ring is even >>> allocated. >>> >>> And you also need to fix a couple of other places in amdgpu_ttm.c. >>> >>> Regards, >>> Christian. >>> >>> Am 28.02.2018 um 13:34 schrieb Liu, Monk: >>>> Because when SDMA was hang by like process A, and meanwhile another >>>> process B is already running into the code of fill_buffer() So just let >>>> process B continue, don't block it otherwise process B would fail by >>>> software reason . >>>> >>>> Let it run and finally process B's job would fail and GPU recover >>>> will repeat it again (since it is a kernel job) >>>> >>>> Without this solution other process will be greatly harmed by one >>>> black sheep that triggering GPU recover >>>> >>>> /Monk >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com] >>>> Sent: 2018年2月28日 20:24 >>>> To: Liu, Monk <monk....@amd.com>; amd-gfx@lists.freedesktop.org >>>> Subject: Re: [PATCH 3/4] drm/amdgpu: don't return when ring not >>>> ready for fill_buffer >>>> >>>> Am 28.02.2018 um 08:21 schrieb Monk Liu: >>>>> because this time SDMA may under GPU RESET so its ring->ready may >>>>> not true, keep going and GPU scheduler will reschedule this job if >>>>> it failed. >>>>> >>>>> give a warning on copy_buffer when go through direct_submit while >>>>> ring->ready is false >>>> NAK, that test has already saved us quite a bunch of trouble with the fb >>>> layer. >>>> >>>> Why exactly are you running into issues with that? >>>> >>>> Christian. >>>> >>>>> Change-Id: Ife6cd55e0e843d99900e5bed5418499e88633685 >>>>> Signed-off-by: Monk Liu <monk....@amd.com> >>>>> --- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 6 +----- >>>>> 1 file changed, 1 insertion(+), 5 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>>>> index e38e6db..7b75ac9 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>>>> @@ -1656,6 +1656,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, >>>>> uint64_t src_offset, >>>>> amdgpu_ring_pad_ib(ring, &job->ibs[0]); >>>>> WARN_ON(job->ibs[0].length_dw > num_dw); >>>>> if (direct_submit) { >>>>> + WARN_ON(!ring->ready); >>>>> r = amdgpu_ib_schedule(ring, job->num_ibs, job->ibs, >>>>> NULL, fence); >>>>> job->fence = dma_fence_get(*fence); @@ -1692,11 +1693,6 >>>>> @@ >>>>> int amdgpu_fill_buffer(struct amdgpu_bo *bo, >>>>> struct amdgpu_job *job; >>>>> int r; >>>>> >>>>> - if (!ring->ready) { >>>>> - DRM_ERROR("Trying to clear memory with ring turned off.\n"); >>>>> - return -EINVAL; >>>>> - } >>>>> - >>>>> if (bo->tbo.mem.mem_type == TTM_PL_TT) { >>>>> r = amdgpu_ttm_alloc_gart(&bo->tbo); >>>>> if (r) >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx