On 4/16/2025 12:56 PM, Tvrtko Ursulin wrote:
On 15/04/2025 19:43, Sunil Khatri wrote:
add process and pid information in the userqueue error
logging to make it more useful in resolving the error
by logs.
Sample log:
[ 42.444297] [drm:amdgpu_userqueue_wait_for_signal [amdgpu]]
*ERROR* Timed out waiting for fence f=000000001c74d978 for
comm:Xwayland pid:3427
[ 42.444669] [drm:amdgpu_userqueue_suspend [amdgpu]] *ERROR* Not
suspending userqueue, timeout waiting for comm:Xwayland pid:3427
[ 42.824729] [drm:amdgpu_userqueue_wait_for_signal [amdgpu]]
*ERROR* Timed out waiting for fence f=0000000074407d3e for
comm:systemd-logind pid:1058
[ 42.825082] [drm:amdgpu_userqueue_suspend [amdgpu]] *ERROR* Not
suspending userqueue, timeout waiting for comm:systemd-logind pid:1058
Signed-off-by: Sunil Khatri <sunil.kha...@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 1867520ba258..05c1ee27a319 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -43,7 +43,7 @@ amdgpu_userqueue_cleanup(struct amdgpu_userq_mgr
*uq_mgr,
if (f && !dma_fence_is_signaled(f)) {
ret = dma_fence_wait_timeout(f, true, msecs_to_jiffies(100));
if (ret <= 0) {
- DRM_ERROR("Timed out waiting for fence f=%p\n", f);
+ drm_file_err(uq_mgr->file, "Timed out waiting for fence
f=%p\n", f);
You decided to leave %p after all?
Yes we are printing the fence ptr here to see which fence is timing out.
Anyways right now intention of this patch is to add additional process
information along with existing information like fence here.
regards
Sunil
return;
}
}
@@ -440,7 +440,8 @@ amdgpu_userqueue_resume_all(struct
amdgpu_userq_mgr *uq_mgr)
}
if (ret)
- DRM_ERROR("Failed to map all the queues\n");
+ drm_file_err(uq_mgr->file, "Failed to map all the queue\n");
You lost the plural by accident.
Yes i will add 's'. Noted.
I am also not sure "all the queues" makes sense in this context versus
"all queues" but it's inconsequential really.
Regards
Sunil
Yes it all queues from a uq_mgr.
+
return ret;
}
@@ -598,7 +599,8 @@ amdgpu_userqueue_suspend_all(struct
amdgpu_userq_mgr *uq_mgr)
}
if (ret)
- DRM_ERROR("Couldn't unmap all the queues\n");
+ drm_file_err(uq_mgr->file, "Couldn't unmap all the queues\n");
+
return ret;
}
@@ -615,7 +617,7 @@ amdgpu_userqueue_wait_for_signal(struct
amdgpu_userq_mgr *uq_mgr)
continue;
ret = dma_fence_wait_timeout(f, true, msecs_to_jiffies(100));
if (ret <= 0) {
- DRM_ERROR("Timed out waiting for fence f=%p\n", f);
+ drm_file_err(uq_mgr->file, "Timed out waiting for fence
f=%p\n", f);
return -ETIMEDOUT;
}
}
@@ -634,13 +636,13 @@ amdgpu_userqueue_suspend(struct
amdgpu_userq_mgr *uq_mgr,
/* Wait for any pending userqueue fence work to finish */
ret = amdgpu_userqueue_wait_for_signal(uq_mgr);
if (ret) {
- DRM_ERROR("Not suspending userqueue, timeout waiting for
work\n");
+ drm_file_err(uq_mgr->file, "Not suspending userqueue,
timeout waiting\n");
return;
}
ret = amdgpu_userqueue_suspend_all(uq_mgr);
if (ret) {
- DRM_ERROR("Failed to evict userqueue\n");
+ drm_file_err(uq_mgr->file, "Failed to evict userqueue\n");
return;
It is pre-existing but strikes me as odd that failure to
amdgpu_userqueue_suspend_all() logs a failure to *evict* instead of
suspend (as the previous log does). Anyway, I did not look at the
surrounding code so just thinking out loud.
Yes suspend failed as all the fences were not evicted and thats why
suspend failed. Anyways there are already alex patches which will change
this to unmap as a code reorganisation for suspend/resume is in pipeline.
regards
Sunil
Regards,
Tvrtko
}