On Thu, Jan 23, 2025 at 03:14:16PM -0500, Connor Abbott wrote: > On Thu, Jan 23, 2025 at 2:26 PM Prakash Gupta <quic_gup...@quicinc.com> wrote: > > > > On Thu, Jan 23, 2025 at 09:00:17AM -0500, Connor Abbott wrote: > > > On Thu, Jan 23, 2025 at 6:10 AM Prakash Gupta <quic_gup...@quicinc.com> > > > wrote: > > > > > > > > On Wed, Jan 22, 2025 at 03:00:58PM -0500, Connor Abbott wrote: > > > > The context would remain stalled till we write to CBn_RESUME. Which is > > > > done > > > > in qcom_adreno_smmu_resume_translation(). For a stalled context further > > > > transactions are not processed and we shouldn't see further faults and > > > > or fault inerrupts. Do you observe faults with stalled context? > > > > > > Yes. I've observed that on MMU-500 writing RESUME before the interrupt > > > has been cleared doesn't clear SS. This happened with v2 in the case > > > where there was already a devcoredump and drm/msm called > > > qcom_adreno_smmu_resume_translation() immediately from its fault > > > handler, and we'd get a storm of unhandled interrupts until it was > > > disabled. Given that the architecture spec says we're supposed to > > > clear the interrupt first this may have been an attempt to "help" > > > developers. > > > > > > > Just to clarify, present sequence is: > > 1. context fault with stall enable. FSR.SS set. > > 2. Report fault to client > > 3. resume/terminate stalled transaction > > 4. clear FSR > > > > At what point when you try #2->#3->#4 or #4->#2->#3 sequence, is FSR.SS > > cleared and interrupt storm is observed. > > With #2->#3->#4 FSR.SS is *not* cleared and there is a subsequent > interrupt storm with only FSR.SS asserted. With #4->#2->#3 there is no > interrupt storm. From this we conclude that MMU-500 does not clear > FSR.SS unless #4 happens before #3. > Thanks Connor for clarification. I get your point now. I think it's expected interrupt storm with #2->#3->#4 sequence is expected. With CONFIG_ARM_SMMU_QCOM_DEBUG enabled, context fault follows the sequence of #1->#2->#4->#3, which is spec recommended.
so, common fault handler can be modified to follow the same sequence, but I have concern regarding clearing FSR before reporting fault to client. qcom_adreno_smmu_get_fault_info() is an example I gave but other client handler may have similar expecation of fault register not changed before client fault handler is called. > > The way CFIE disable is helping > > with current patch indicates write FSR is unstalling context and subsequent > > transactions are faulted. > > No, it does not indicate that. The interrupt storm happens even when > there is exactly one context fault, and when the interrupt storm > happens *only* FSR.SS is asserted. I've verified this with debug > prints. Once more, MMU-500 will assert an interrupt when only FSR.SS > is asserted. This has nothing to do with subsequent transactions. > > > Do you stop getting interrupt storm after write > > RESUME. > > Yes, as long as the write to RESUME happens after all other bits in > FSR are cleared. > > > If you can mention your SCTLR configuration and FSR state in test > > sequence, it would be clearer. > > SCTLR has both HUPCF and CFCFG enabled. > > > > > An aside, If reducing delay between FSR and RESUME write helps then both > > can be done as part of qcom_adreno_smmu_resume_translation(). This will > > follow spec recommendation and also make sure fault registers are not > > cleared before reporting fault to client. I think this sequence should address the issue you are observing. > > > > Thanks, > > Prakash