On Mon Jun 2, 2025 at 3:25 AM PDT, Philipp Reisner wrote: > Hi Christopher, > > Thanks for following up. The bug still annoys me from time to time. > It triggered last on May 8, May 12, and May 18. > The crash on May 18 was already with the 6.14.5 kernel. > >> Could this sleep wake issue also be caused by a similar thing to the >> panics and SMU hangs I was experiencing with my own issue? It's an issue >> known to have the same workaround for both 6000 and 7000 series users. A >> specific kernel commit seems to affect it as well. >> > > I posted the stack trace earlier in the thread. The question is, what > was the stack > trace of the issue you are referring to? > >> >> If you could test whether you can still reproduce the error after >> disabling GFXOFF states with the following kernel commandline override: >> >> amdgpu.ppfeaturemask=0xfff73fff >> > > that disables PP_OVERDRIVE_MASK, PP_GFXOFF_MASK, > and PP_GFX_DCS_MASK. > > IMHO, that looks like a mitigation for something different than the non-ready > compute schedulers that seem to be the root cause for the NULL pointer derefs > in my case.
Indeed, it's mitigating something that leads to SMU firmware hangs. I made a guess, I probably guessed poorly, that your compute units may be failing to wake up due to a SMU hang. But you have no SMU hang log notices, so it's probably not that. Oh well. > > Anyhow, I will give it a try, and will report back if my workstation > does not deref > NULL pointers for more than three weeks with that amdgpu.ppfeaturemask set. > > Best regards, > Philipp