On Sat, Jun 27, 2026 at 10:18:45PM +1000, Sam Day via B4 Relay wrote: > From: Sam Day <[email protected]> > > If the peak vote for mdp1-mem is allowed to drop to zero, it seems to > cause the fabric to collapse that path entirely, which causes the device > to bus stall and fatally reset. > > This issue was identified specifically on sdm845-oneplus-fajita, so this > workaround is applied narrowly to SDM845's MDSS. > > --- > This RFC patch is a spiritual successor to the "Addressing stability > issues on SDM845 with the -next tree" series sent by David and Petr 6 > months ago. > > As Dmitry pointed out, the patch introduces leakages to the runtime PM > refcounting. In practice, this means that MDSS never actually gets > suspended, which is why the patch appeared to "fix" the issue. > > The deeper root cause is that, when msm_mdss_disable() runs and unvotes > the mdp1-mem interconnect bandwidth, that seems to collapse the fabric > entirely and causes the bus stall -> hang -> reboot behaviour. > > I've confirmed that a tiny non-zero peak bandwidth vote keeps the fabric > alive and avoids the issue. > > Of course, this is still a fairly egregious hack, but it *does* allow > blanking to suspend and resume DSI + DPU + MDSS properly without the bus > stall.
I'm a bit sceptical about this patch. The Lenovo Yoga C630 uses a variant of SDM845. There I don't observe any issues with the MM itself. But cluster suspend can cause issues there too. I suspect that there is a missing vote (or undervote) on the CX or MX, which results in suspend/resume crashes. And if that's true, then your patch does exactly that - I think it will add an internal CX vote, which won't be dropped, preventing CX collapse. > > Here's what I've validated with instrumentation: > > * DSI host disable, IRQ disable, PLL state save, host power-off, link > clock disable, regulator disable, SFPB disable, and PHY disable all > complete successfully before the fatal reset occurrs. > * DPU runtime suspend also completes. The bandwidth accounting was > checked and confirmed to reach runtime suspend with 0 refs, with no > pending frame state. > * The device survives through MDSS clock disabling and mdp0-mem > zero voting, it's really just the mdp1-mem zero vote that is isolated > as the cause of the stall + reset. Will it work if you suspend the MDSS (dropping all votes) and then forcibly break the device suspend by returning an error from the later stage? > > So, I'm not really sure where to go from here. I'm sure that this > workaround is not suitable for inclusion upstream as it still seems to > be papering over an underlying issue... But it's unclear to me if this is > some kind of hardware quirk on SDM845, a problem with the SDM845 DT > wiring, a driver issue, or something else entirely. I don't have a good advice here. Try disabling cluster idle node. If the device still works, it's not a mdp1-mem. > > I'd appreciate any advice on how to further diagnose this issue and what > direction to take from here. > > Kind regards, > -Sam > > Link: > https://lore.kernel.org/phone-devel/[email protected]/ > Signed-off-by: Sam Day <[email protected]> -- With best wishes Dmitry

