0
[ 84.167691] softirqs last disabled at (1342671): []
__irq_exit_rcu+0xd3/0x140
[ 84.167692] ---[ end trace ]---
[ 84.189957] PM: suspe
Thanks
xinhui
-----Original Message-
From: Pan, Xinhui
Sent: Friday, November 10, 2023 12:51 PM
To: Kuehling, Felix ; amd-gfx@lists.f
@lists.freedesktop.org
Cc: Deng, Emily ; Pan, Xinhui ; Koenig,
Christian
Subject: [RFC PATCH v2] drm/amdkfd: Run restore_workers on freezable WQs
Make restore workers freezable so we don't have to explicitly flush them in
suspend and GPU reset code paths, and we don't accidentally try to restore BOs
whi
, Christian
Sent: Wednesday, September 13, 2023 10:29 PM
To: Kuehling, Felix ; Christian König
; Pan, Xinhui ;
amd-gfx@lists.freedesktop.org; Wentland, Harry
Cc: Deucher, Alexander ; Fan, Shikang
Subject: Re: 回复: [PATCH] drm/amdgpu: Ignore first evction failure during suspend
[+Harry]
Am
tTest.BasicTest
pm-suspend
thanks
xinhui
发件人: Christian König
发送时间: 2023年9月12日 17:01
收件人: Pan, Xinhui ; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander ; Koenig, Christian
; Fan, Shikang
主题: Re: [PATCH] drm/amdgpu: Ignore first evction failure du
in its
suspend callback. SO the first eviction before kfd callback likely fails.
-Original Message-
From: Christian König
Sent: Friday, September 8, 2023 2:49 PM
To: Pan, Xinhui ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Koenig, Christian
; Fan, Shikang
Subject: Re: [PATCH
[AMD Official Use Only - General]
Can we just add kref for entity?
Or just collect such job time usage somewhere else?
-Original Message-
From: Pan, Xinhui
Sent: Thursday, August 17, 2023 1:05 PM
To: amd-gfx@lists.freedesktop.org
Cc: Tuikov, Luben ; airl...@gmail.com;
dri-de
[AMD Official Use Only - General]
comments line.
发件人: Koenig, Christian
发送时间: 2022年11月29日 20:07
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: dan...@ffwll.ch; matthew.a...@intel.com; dri-de...@lists.freedesktop.org;
linux-ker...@vger.kernel.org
[AMD Official Use Only - General]
comments inline.
发件人: Koenig, Christian
发送时间: 2022年11月29日 19:32
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: dan...@ffwll.ch; matthew.a...@intel.com; dri-de...@lists.freedesktop.org;
linux-ker...@vger.kernel.org
goto err_free;
thanks
xinhui
____________
发件人: Pan, Xinhui
发送时间: 2022年11月29日 18:56
收件人: amd-gfx@lists.freedesktop.org
抄送: dan...@ffwll.ch; matthew.a...@intel.com; Koenig, Christian;
dri-de...@lists.freedesktop.org; linux-ker...@vger.kernel.org; Paneer Se
just re-sort these blocks in ascending order if memory is
indeed continuous?
thanks
xinhui
发件人: Christian König
发送时间: 2022年11月29日 1:11
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander
主题: Re: [PATCH] drm/amdgpu: New method to check
[AMD Official Use Only - General]
Hi Arun,
Thanks for your reply. comments are inline.
发件人: Paneer Selvam, Arunpravin
发送时间: 2022年11月29日 1:09
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: linux-ker...@vger.kernel.org; dri-de...@lists.freedesktop.org
Christian König ; Pan, Xinhui
; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Koenig, Christian
Subject: Re: [PATCH] drm/amdgpu: Fix a NULL pointer of fence
Am 2022-07-07 um 05:54 schrieb Christian König:
> Am 07.07.22 um 11:50 schrieb xinhui pan:
>> Fence is accessed by dma_r
: 2022年4月13日 15:30
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander
主题: AW: [PATCH] drm/amdgpu: Make sure ttm delayed work finished
We don't need that.
TTM only reschedules when the BOs are still busy.
And if the BOs are still busy when you unload the driver we have
ce->flags))
+ goto out;
+
if (intr && signal_pending(current)) {
ret = -ERESTARTSYS;
goto out;
发件人: Koenig, Christian
发送时间: 2022年4月12日 20:11
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org; Dani
Christian
发送时间: 2021年11月9日 21:18
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: dri-de...@lists.freedesktop.org
主题: Re: 回复: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list
Exactly that's the reason why we should have the double check in TTM
I've mentioned in t
ist is on vram domain) to sMem.
发件人: Pan, Xinhui
发送时间: 2021年11月9日 21:05
收件人: Koenig, Christian; amd-gfx@lists.freedesktop.org
抄送: dri-de...@lists.freedesktop.org
主题: 回复: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list
Yes, a stable tag i
omain_start(adev, mem->mem_type) +
209 mm_cur->start;
210 return 0;
211 }
line 208, *addr is zero. So when amdgpu_copy_buffer submit job with such addr,
page fault happens.
发件人: Koenig, Christian
发送时
, Christian
发送时间: 2021年11月9日 20:20
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: dri-de...@lists.freedesktop.org
主题: Re: [PATCH] drm/ttm: Put BO in its memory manager's lru list
Am 09.11.21 um 12:19 schrieb xinhui pan:
> After we move BO to a new memory region, we should put it to
> the
[AMD Official Use Only]
Why? just to evict some inactive vram BOs?
From: Koenig, Christian
Sent: Friday, September 17, 2021 3:06:16 PM
To: Pan, Xinhui ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander
Subject: Re: [PATCH] drm/amdgpu: Let BO created in its
[AMD Official Use Only]
Reviewed-by: xinhui pan
-Original Message-
From: amd-gfx On Behalf Of Andrey
Grodzovsky
Sent: 2021年9月16日 3:42
To: amd-gfx@lists.freedesktop.org
Cc: Quan, Evan ; Pan, Xinhui ; Deucher,
Alexander ; Grodzovsky, Andrey
Subject: [PATCH] drm/amdgpu: Fix crash on
发件人: Pan, Xinhui
发送时间: 2021年9月15日 14:37
收件人: amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander; Koenig, Christian; Grodzovsky, Andrey; Pan, Xinhui
主题: [PATCH v2] drm/amdgpu: Put drm_dev_enter/exit outside hot codepath
We hit soft hang while doing memory
; Pan, Xinhui;
amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander
主题: Re: 回复: [PATCH v2] drm/amdgpu: Fix a race of IB test
On 9/13/2021 12:21 PM, Christian König wrote:
> Keep in mind that we don't try to avoid contention here. The goal is
> rather to have as few locks as possible t
[AMD Official Use Only]
These IB tests are all using direct IB submission including the delayed init
work.
发件人: Koenig, Christian
发送时间: 2021年9月13日 14:19
收件人: Pan, Xinhui; Christian König; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander
主题: Re: 回复
understand.
发件人: Koenig, Christian
发送时间: 2021年9月13日 14:31
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander
主题: Re: [PATCH v3 1/3] drm/amdgpu: UVD avoid memory allocation during IB test
Am 11.09.21 um 03:34 schrieb xinhui pan:
> m
: Christian König
发送时间: 2021年9月13日 14:35
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Koenig, Christian; dan...@ffwll.ch; dri-de...@lists.freedesktop.org; Chen,
Guchun
主题: Re: [RFC PATCH] drm/ttm: Try to check if new ttm man out of bounds during
compile
Am 13.09.21 um 05:36 schrieb
[AMD Official Use Only]
yep, that is a lazy way to fix it.
I am thinking of adding one amdgpu_ring.direct_access_mutex before we issue
test_ib on each ring.
发件人: Lazar, Lijo
发送时间: 2021年9月13日 12:00
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送
sync method.
But I see device resume itself woud flush it. So there is no race between them
as userspace is still freezed.
I will drop this flush in V2.
发件人: Christian König
发送时间: 2021年9月11日 15:45
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher
.
发件人: Koenig, Christian
发送时间: 2021年9月10日 19:10
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander
主题: Re: 回复: 回复: 回复: [PATCH 2/4] drm/amdgpu: UVD avoid memory allocation during
IB test
Yeah, but that IB test should use the indirect submission through the
scheduler
[AMD Official Use Only]
we need take this lock.
IB test can be triggered through debugfs. Recent days I usually test it by cat
gpu recovery and amdgpu_test_ib in debugfs.
发件人: Koenig, Christian
发送时间: 2021年9月10日 18:02
收件人: Pan, Xinhui; amd-gfx
should use
DIRECT pool.
Looks like we should only use reserved BO for direct IB submission.
As for delayed IB submission, we could alloc a new one dynamicly.
发件人: Koenig, Christian
发送时间: 2021年9月10日 16:53
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄
[AMD Official Use Only]
I am wondering if amdgpu_bo_pin would change BO's placement in the futrue.
For now, the new placement is calculated by new = old ∩ new.
发件人: Koenig, Christian
发送时间: 2021年9月10日 14:24
收件人: Pan, Xinhui; amd-gfx@lists.freedeskto
[AMD Official Use Only]
I am using vim with
set tabstop=8
set shiftwidth=8
set softtabstop=8
发件人: Koenig, Christian
发送时间: 2021年9月10日 14:33
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander
主题: Re: [PATCH 4/4] drm/amdgpu: VCN avoid
; Koenig,
Christian ; Pan, Xinhui ;
Deucher, Alexander
Cc: Chen, Guchun ; Shi, Leslie
Subject: [PATCH] drm/ttm: add a BUG_ON in ttm_set_driver_manager when array
bounds
Vendor will define their own memory types on top of TTM_PL_PRIV,
but call ttm_set_driver_manager directly without checking mem_type
[AMD Official Use Only]
well, If IB test fails because we use gtt domain or
the above 256MB vram. Then the failure is expected.
Doesn't IB test exist to detect such issue?
发件人: Koenig, Christian
发送时间: 2021年9月9日星期四 15:16
收件人: Pan, Xinhui; am
[AMD Official Use Only]
yep, vcn need 128kb extra memory. I will make the pool size constant as 256kb.
From: Koenig, Christian
Sent: Thursday, September 9, 2021 3:14:15 PM
To: Pan, Xinhui ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander
Subject: Re
[AMD Official Use Only]
There is one dedicated IB pool for IB test. So lets use it for extra msg
too.
For UVD on older HW, use one reserved BO at specific range.
Signed-off-by: xinhui pan
---
drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 173 +++-
drivers/gpu/drm/amd/amdgpu/amd
[AMD Official Use Only]
Direct IB pool is used for vce/uvd/vcn IB extra msg too. Increase its
size to 64 pages.
Signed-off-by: xinhui pan
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
b/driv
> 2021年9月8日 14:23,Christian König 写道:
>
> Am 08.09.21 um 03:25 schrieb Pan, Xinhui:
>>> 2021年9月7日 20:37,Koenig, Christian 写道:
>>>
>>> Am 07.09.21 um 14:26 schrieb xinhui pan:
>>>> There is one dedicated IB pool for IB test. So lets use it fo
> 2021年9月7日 20:37,Koenig, Christian 写道:
>
> Am 07.09.21 um 14:26 schrieb xinhui pan:
>> There is one dedicated IB pool for IB test. So lets use it for uvd msg
>> too.
>>
>> For some older HW, use one reserved BO at specific range.
>>
>> Signed-off-by: xinhui pan
>> ---
>> drivers/gpu/drm/am
[AMD Official Use Only]
It is the internal staging drm-next.
-Original Message-
From: Koenig, Christian
Sent: 2021年9月6日 19:26
To: Pan, Xinhui ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; che...@uniontech.com;
dri-de...@lists.freedesktop.org
Subject: Re: [PATCH v2 1/2] drm
> 2021年9月6日 17:04,Christian König 写道:
>
>
>
> Am 06.09.21 um 03:12 schrieb xinhui pan:
>> A long time ago, someone reports system got hung during memory test.
>> In recent days, I am trying to look for or understand the potential
>> deadlock in ttm/amdgpu code.
>>
>> This patchset aims to fi
[AMD Official Use Only]
Like vce/vcn does, visible VRAM is OK for ib test.
While commit a11d9ff3ebe0 ("drm/amdgpu: use GTT for
uvd_get_create/destory_msg") says VRAM is not mapped correctly in his
platform which is likely an arm64.
So lets change back to use VRAM on x86_64 platform.
Signed-off-b
[AMD Official Use Only]
The ret value might be -EBUSY, caller will think lru lock is still
locked but actually NOT. So return -ENOSPC instead. Otherwise we hit
list corruption.
ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0,
caller(ttm_tt_populate -> ttm_global_swapout ->ttm
[AMD Official Use Only]
A long time ago, someone reports system got hung during memory test.
In recent days, I am trying to look for or understand the potential
deadlock in ttm/amdgpu code.
This patchset aims to fix the deadlock during ttm populate.
TTM has a parameter called pages_limit, when a
Fall through to handle the error instead of return.
Fixes: f8aab60422c37 ("drm/amdgpu: Initialise drm_gem_object_funcs for
imported BOs")
Cc: sta...@vger.kernel.org
Signed-off-by: xinhui pan
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 23 ++-
1 file changed, 10 insertions(+
Fall through to handle the error instead of return.
Signed-off-by: xinhui pan
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 85b292ed5c43..7ddd429052ea 100644
在 2021/8/31 13:38,“Pan, Xinhui” 写入:
在 2021/8/31 12:03,“Grodzovsky, Andrey” 写入:
On 2021-08-30 11:24 p.m., Pan, Xinhui wrote:
> [AMD Official Use Only]
>
> [AMD Official Use Only]
>
> Unreserve root B
在 2021/8/31 12:03,“Grodzovsky, Andrey” 写入:
On 2021-08-30 11:24 p.m., Pan, Xinhui wrote:
> [AMD Official Use Only]
>
> [AMD Official Use Only]
>
> Unreserve root BO before return otherwise next allocation got deadlock.
>
> Sign
[AMD Official Use Only]
Unreserve root BO before return otherwise next allocation got deadlock.
Signed-off-by: xinhui pan
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 11 +--
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
b/driver
[AMD Official Use Only]
If one GTT BO has been evicted/swapped out, it should sit in CPU domain.
TTM only alloc struct ttm_resource instead of struct ttm_range_mgr_node
for sysMem.
Now when we update mapping for such invalidated BOs, we might walk out
of bounds of struct ttm_resource.
Three poss
> 2021年7月14日 16:33,Christian König 写道:
>
> Hi Eric,
>
> feel free to push into amd-staging-dkms-5.11, but please don't push it into
> amd-staging-drm-next.
>
> The later will just cause a merge failure which Alex needs to resolve
> manually.
>
> I can take care of pushing to amd-staging-dr
Felix
What I am wondreing is that if CP got hang, could we assume all usermode
queues have stopped?
If so, we can do cleanupwork regardless of the retval of execute_queues_cpsch().
> 2021年6月17日 20:11,Pan, Xinhui 写道:
>
> Felix
> what I am thinking of like below looks like
ang) {
+ retval = -EIO;
+ goto failed_try_destroy_debugged_queue;
+ }
+
if (qpd->is_debug) {
/*
* error, currently we do not allow to destroy a queue
> 2021年6月17日 20:02,Pan, Xinhui 写道:
>
> Handle queue destroy failur
> 2021年6月17日 06:55,Kuehling, Felix 写道:
>
> On 2021-06-16 4:35 a.m., xinhui pan wrote:
>> Some resource are freed even destroy queue fails.
>
> Looks like you're keeping this behaviour for -ETIME. That is consistent with
> what pqn_destroy_queue does. What you're fixing here is the behaviour f
> 2021年6月16日 12:36,Kuehling, Felix 写道:
>
> Am 2021-06-16 um 12:01 a.m. schrieb Pan, Xinhui:
>>> 2021年6月16日 02:22,Kuehling, Felix 写道:
>>>
>>> [+Xinhui]
>>>
>>>
>>> Am 2021-06-15 um 1:50 p.m. schrieb Amber Lin:
>>&
> 2021年6月16日 02:22,Kuehling, Felix 写道:
>
> [+Xinhui]
>
>
> Am 2021-06-15 um 1:50 p.m. schrieb Amber Lin:
>> Calling free_mqd inside of destroy_queue_nocpsch_locked can cause a
>> circular lock. destroy_queue_nocpsch_locked is called under a DQM lock,
>> which is taken in MMU notifiers, potent
> 2021年6月15日 20:01,Christian König 写道:
>
> Am 15.06.21 um 13:57 schrieb xinhui pan:
>> Amdgpu set SG flag in populate callback. So TTM still count pages in SG
>> BO.
>
> It's probably better to fix this instead. E.g. why does amdgpu modify the SG
> flag during populate and not during initial
o.bdev->lru_lock);
ret = amdgpu_amdkfd_remove_eviction_fence(bo, ef);
dma_resv_unlock(bo->tbo.base.resv);
发件人: Kuehling, Felix
发送时间: 2021年5月22日 2:24
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander; Koenig, Chris
[AMD Official Use Only]
I just sent out patch below yesterday. swapping unpopulated bo is useless
indeed.
[RFC PATCH 2/2] drm/ttm: skip swapout when ttm has no backend page.
发件人: Christian König
发送时间: 2021年5月20日 14:39
收件人: Pan, Xinhui; Kuehling, Felix
I just sent out patch below yesterday. swapping unpopulated bo is useless
indeed.
[RFC PATCH 2/2] drm/ttm: skip swapout when ttm has no backend page.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/
König; Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander; dan...@ffwll.ch; Koenig, Christian;
dri-de...@lists.freedesktop.org
主题: Re: 回复: [RFC PATCH 1/2] drm/amdgpu: Fix memory corruption due to swapout
and swapin
Looks like we're creating the userptr BO as ttm_bo_type_devi
as TTM_PAGE_FLAG_SWAPPED is set.
Now here is the problem, we swapin data to ttm bakend memory from swap storage.
That just causes the memory been overwritten.
发件人: Christian König
发送时间: 2021年5月19日 18:01
收件人: Pan, Xinhui; Kuehling, Felix; amd-gfx@lists.freedesktop.org
PRIORITY; ++i) {
- list_for_each_entry(bo, &glob->swap_lru[i], swap) {
[snip]
+ for (i = TTM_PL_SYSTEM; i < TTM_NUM_MEM_TYPES; ++i) {
+ for (j = 0; j < TTM_MAX_BO_PRIORITY; ++j) {
________
发件人: Pan, Xinhui
发送时间: 2021
Chris' patch as I think it desnt help. Or I can have a
try later.
发件人: Kuehling, Felix
发送时间: 2021年5月19日 11:29
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander; Koenig, Christian; dri-de...@lists.freedesktop.org;
dan...@ffwll.ch
主
Memory
TEST_F(KFDMemoryTest, MemoryAlloc) {
TEST_START(TESTPROFILE_RUNALL)
--
2.25.1
____________
发件人: Pan, Xinhui
发送时间: 2021年5月19日 10:28
收件人: amd-gfx@lists.freedesktop.org
抄送: Kuehling, Felix; Deucher, Alexander; Koenig, Christian;
dri-de...@lists.freedeskt
_
发件人: Yu Kuai
发送时间: 2021年5月17日 16:16
收件人: Deucher, Alexander; Koenig, Christian; Pan, Xinhui; airl...@linux.ie;
dan...@ffwll.ch
抄送: amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org;
linux-ker...@vger.kernel.org; yuku...@huawei.com; yi.zh...@huawei.com
主题: [PATCH] drm/amdgp
[AMD Official Use Only - Internal Distribution Only]
Reviewed-by: xinhui pan
From: Christian König
Sent: Wednesday, May 5, 2021 7:01:46 PM
To: Pan, Xinhui ; Deucher, Alexander
; amd-gfx@lists.freedesktop.org
Subject: [PATCH] MAINTAINERS: Add Xinhui Pan as
[AMD Official Use Only - Internal Distribution Only]
I don’t think so. Start is offset here. We get the valid physical address from
pages_addr[offset] when we update mapping.
Btw, what issue we are seeing?
-Original Message-
From: amd-gfx On Behalf Of Christian
K?nig
Sent: 2021年3月23日 2
[AMD Official Use Only - Internal Distribution Only]
Because this is not a deadlock of lock itself.
Just because something like
while(true) {
LOCKIRQ
...
UNLOCKIRQ
...
}
I think scheduler policy is voluntary. So it never schedule out if there is no
sleep function and then soft lockup showed
[AMD Official Use Only - Internal Distribution Only]
No, the patch from Nirmoy did not fully fix this issue. I will send another fix
patch later.
-Original Message-
From: amd-gfx On Behalf Of Christian
K?nig
Sent: 2021年3月20日 17:08
To: Kuehling, Felix ; Paneer Selvam, Arunpravin
; amd
[AMD Official Use Only - Internal Distribution Only]
Pls ignore this patch.
-Original Message-
From: Pan, Xinhui
Sent: 2020年9月29日 13:17
To: amd-gfx@lists.freedesktop.org
Cc: Koenig, Christian ; Deucher, Alexander
; Pan, Xinhui
Subject: [PATCH] amd/amdgpu: Fix resv shared fence
Reviewed-by: xinhui pan
> 2020年9月3日 17:03,Christian König 写道:
>
> Calculate the correct value for max_entries or we might run after the
> page_address array.
>
> v2: Xinhui pointed out we don't need the shift
> v3: use local copy of start and simplify some calculation
> v4: fix the case that
> 2020年9月2日 22:50,Tuikov, Luben 写道:
>
> On 2020-09-02 00:43, Pan, Xinhui wrote:
>>
>>
>>> 2020年9月2日 11:46,Tuikov, Luben 写道:
>>>
>>> On 2020-09-01 21:42, Pan, Xinhui wrote:
>>>> If you take a look at the below function, you s
> 2020年9月2日 23:21,Christian König 写道:
>
> Calculate the correct value for max_entries or we might run after the
> page_address array.
>
> v2: Xinhui pointed out we don't need the shift
> v3: use local copy of start and simplify some calculation
>
> Signed-off-by: Christian König
> Fixes: 1e6
> 2020年9月2日 22:31,Christian König 写道:
>
> Am 02.09.20 um 16:27 schrieb Pan, Xinhui:
>>
>>> 2020年9月2日 22:05,Christian König 写道:
>>>
>>> Calculate the correct value for max_entries or we might run after the
>>> page_address array.
>
> 2020年9月2日 22:05,Christian König 写道:
>
> Calculate the correct value for max_entries or we might run after the
> page_address array.
>
> v2: Xinhui pointed out we don't need the shift
>
> Signed-off-by: Christian König
> Fixes: 1e691e244487 drm/amdgpu: stop allocating dummy GTT nodes
> ---
> 2020年9月2日 20:05,Christian König 写道:
>
> Calculate the correct value for max_entries or we might run after the
> page_address array.
>
> Signed-off-by: Christian König
> Fixes: 1e691e244487 drm/amdgpu: stop allocating dummy GTT nodes
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 3 ++-
> 1
285 list_move(&vm_bo->vm_status, &vm_bo->vm->relocated);
>> 286 else
>> 287 amdgpu_vm_bo_idle(vm_bo);
>> 288 }
>>
>> Why you need to do the bo->parent check out side ?
because it is me that moves such logic into amdgpu_vm_bo_r
> 2020年9月2日 11:46,Tuikov, Luben 写道:
>
> On 2020-09-01 21:42, Pan, Xinhui wrote:
>> If you take a look at the below function, you should not use driver's
>> release to free adev. As dev is embedded in adev.
>
> Do you mean "look at the function below"
t;Tuikov, Luben"
日期: 2020年9月2日 星期三 09:07
收件人: "amd-gfx@lists.freedesktop.org" ,
"dri-de...@lists.freedesktop.org"
抄送: "Deucher, Alexander" , Daniel Vetter
, "Pan, Xinhui" , "Tuikov, Luben"
主题: [PATCH 0/3] Use implicit kref infra
Use
of total release sequence.
Or still use the final_kfree to free adev and our release callback just do some
other cleanup work.
From: Tuikov, Luben
Sent: Wednesday, September 2, 2020 4:35:32 AM
To: Alex Deucher ; Pan, Xinhui ;
Daniel Vetter
Cc: amd-gfx@lists.freed
[AMD Official Use Only - Internal Distribution Only]
Remove the private obj from the internal list before we free aconnector.
[ 56.925828] BUG: unable to handle page fault for address: 8f84a870a560
[ 56.933272] #PF: supervisor read access in kernel mode
[ 56.938801] #PF: error_code(0x00
[AMD Official Use Only - Internal Distribution Only]
drm_dev_alloc() alloc *dev* and set managed.final_kfree to dev to free
itself.
Now from commit 5cdd68498918("drm/amdgpu: Embed drm_device into
amdgpu_device (v3)") we alloc *adev* and ddev is just a member of it.
So drm_dev_release try to free a
(struct amdgpu_device *adev,
unsigned int block)
From: Dan Carpenter
Sent: Wednesday, May 6, 2020 5:17:34 PM
To: Zhou1, Tao
Cc: Pan, Xinhui ; amd-gfx@lists.freedesktop.org
Subject: Re: [bug report] drm/amdgpu: add amdgpu_ras.c to support ras (v2)
On Wed, May 06
that breaks the device list in gpu recovery.
From: Pan, Xinhui
Sent: Friday, April 17, 2020 7:11:40 PM
To: Chen, Guchun ; amd-gfx@lists.freedesktop.org
; Zhang, Hawking ; Li,
Dennis ; Clements, John ; Koenig,
Christian
Subject: Re: [PATCH] drm/amdgpu: fix
[AMD Official Use Only - Internal Distribution Only]
This patch shluld fix the panic.
but I would like you do NOT add adev xgmi head to the local device list. if ras
ue occurs while the gpu is already in gpu recovery.
From: amd-gfx on behalf of Christian
K?nig
ils :
Image[0]: Size(61952 Bytes), Type(Legacy Image)
Image[1]: Size(43520 Bytes), Type(EFI Image)
发件人: "Liang, Prike"
日期: 2020年4月13日 星期一 12:23
收件人: "Pan, Xinhui" , Johannes Hirte
抄送: "Deucher, Alexander" , "Huang, Ray"
, "Quan, Evan" ,
&q
Prike
I hit this issue too. reboot hung with my vega10. it is ok with navi10.
From: amd-gfx on behalf of Liang, Prike
Sent: Sunday, April 12, 2020 11:49:39 AM
To: Johannes Hirte
Cc: Deucher, Alexander ; Huang, Ray
; Quan, Evan ;
amd-gfx@lists.freedesktop.org
other hand you are right that cond_resched() has the advantage that we
> could spend more time on cleaning up old BOs if there is nothing else for the
> CPU TODO.
>
> Regards,
> Christian.
>
> Am 09.04.20 um 16:24 schrieb Pan, Xinhui:
>> https://elixir.bootlin.com/linux
https://elixir.bootlin.com/linux/latest/source/mm/slab.c#L4026
This is another example of the usage of cond_sched.
From: Pan, Xinhui
Sent: Thursday, April 9, 2020 10:11:08 PM
To: Lucas Stach ; amd-gfx@lists.freedesktop.org
; Koenig, Christian
Cc: dri-de
I think it doesn't matter if workitem schedule out. Even we did not schedule
out, the workqueue itself will schedule out later.
So it did not break anything with this patch I think.
From: Pan, Xinhui
Sent: Thursday, April 9, 2020 10:07:09 PM
To: Lucas Stach
From: Koenig, Christian
Sent: Thursday, April 9, 2020 9:38:24 PM
To: Lucas Stach ; Pan, Xinhui ;
amd-gfx@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org
Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete
worker
Am 09.04.20 um 15:25 schrieb Lucas Stach
Reviewed-by: xinhui pan
> 2020年3月31日 22:25,Christian König 写道:
>
> The exclusive fence is only optional.
>
> Signed-off-by: Christian König
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 6 --
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu
Reviewed-by: xinhui pan
> 2020年3月30日 18:50,Christian König 写道:
>
> The problem is that we can't add the clear fence to the BO
> when there is an exclusive fence on it since we can't
> guarantee the the clear fence will complete after the
> exclusive one.
>
> To fix this refactor the function
> 2020年3月27日 16:24,Koenig, Christian 写道:
>
> Am 27.03.20 um 04:08 schrieb xinhui pan:
>> We have three ib pools, they are normal, VM, direct pools.
>>
>> Any jobs which schedule IBs without dependence on gpu scheduler should
>> use DIRECT pool.
>>
>> Any jobs schedule direct VM update IBs sho
> 2020年3月26日 14:51,Koenig, Christian 写道:
>
>
>
> Am 26.03.2020 07:45 schrieb "Pan, Xinhui" :
>
>
> > 2020年3月26日 14:36,Koenig, Christian 写道:
> >
> >
> >
> > Am 26.03.2020 07:15 schrieb "Pan, Xinhui" :
> >
>
> 2020年3月26日 14:36,Koenig, Christian 写道:
>
>
>
> Am 26.03.2020 07:15 schrieb "Pan, Xinhui" :
>
>
> > 2020年3月26日 13:38,Koenig, Christian 写道:
> >
> > Yeah that's on my TODO list for quite a while as well.
> >
> > But we
for IB tests pool.
>
> Thanks,
> Christian.
>
> Am 26.03.2020 03:02 schrieb "Pan, Xinhui" :
> Another ib poll for direct submit.
> Any jobs schedule IBs without dependence on gpu scheduler should use
> this pool firstly.
>
> Signed-off-by: xinhui pan
> ---
>
have a littke time to fix this deadlock.
if you want to repro it, set gpu timeout to 50ms,then run vulkan,ocl,
amdgputest, etc together.
I believe you will see more weird issues.
From: Liu, Monk
Sent: Thursday, March 26, 2020 1:31:04 PM
To: Pan, Xinhui ; amd-gfx
well, submit job with HW disabled shluld be no harm.
The only concern is that we might use up IBs if we park scheduler thread during
recovery.
I have saw recovery stuck in sa new functuon.
ring test alloc IBs to test if recovery succeed or not. But if there is no
enough IBs it will wait fences
1 - 100 of 205 matches
Mail list logo