Am 05.08.24 um 08:02 schrieb James Lawrence:
Apologies if I'm hitting the wrong mailing list. long time user, first
time reporter and all that.
Sorry for the delayed reply Without a maintainer in CC such requests
are usually overlooked on the mailing list.
recently my system has been suffering from instability with the
graphics system. essentially some application on my system is causing
oom for graphics memory.
normally I'd just expect a hard crash of the application in such a
scenario. instead the system enters a spin loop of command submissions,
slows down dramatically generally resulting in the system freezing up.
There are a couple issues I'd like to point out with the current
situation I'm experiencing:
* most importantly the error message doesn't provide any useful
information for tracing the source of the issue. no pid, or other
diagnostic information.
* its very noisy when trying to debug. I can occasionally drop my
system to a separate TTY and the message just spams the entire
screen. making it impossible to interact with my system even if I
wanted to load up debugging tools to analyze the situation.
given the error message I believe this line is the source of the log
statement.
|[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command
submission!|
https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c#L1431
Generally I'm wondering if there is anything that can be done to
improve the experience for end users in such a scenario.
Ideally the system would nuke the misbehaving process similar to how
ram ooms are handled.
If you see this message you should get the OOM killer running along with it.
If you don't see this then you probably run into a BUG or something like
that.
What kernel version are you using and what did you do to trigger that?
Regards,
Christian.
but at a minimum I'd like to be able to figure out how to back track
this to the misbehaving process. any help in this regard would be
appreciated.
Sent with Proton Mail <https://proton.me/> secure email.