On Thu, Jan 23, 2025 at 11:51:27AM +0000, Robin Murphy wrote: > On 2025-01-23 11:10 am, Prakash Gupta wrote: > > On Wed, Jan 22, 2025 at 03:00:58PM -0500, Connor Abbott wrote: > > > + /* > > > + * The SMMUv2 architecture specification says that if stall-on-fault is > > > + * enabled the correct sequence is to write to SMMU_CBn_FSR to clear > > > + * the fault and then write to SMMU_CBn_RESUME. Clear the interrupt > > > + * first before running the user's fault handler to make sure we follow > > > + * this sequence. It should be ok if there is another fault in the > > > + * meantime because we have already read the fault info. > > > + */ > > The context would remain stalled till we write to CBn_RESUME. Which is done > > in qcom_adreno_smmu_resume_translation(). For a stalled context further > > transactions are not processed and we shouldn't see further faults and > > or fault inerrupts. Do you observe faults with stalled context? > > This aspect isn't exclusive to stalled contexts though - even for "normal" > terminated faults, clearing the FSR as soon as we've sampled all the > associated fault registers is no bad thing, since if a second fault does > occur while we're still reporting the first, we're then more likely to get a > full syndrome rather than just the FSR.MULTI bit. > ARM SMMUv2 spec recommends, in case of reported fault sw should first correct the condition which casued the fault, I would interpret this as reporting fault to client using callback, and then write CBn_FSR and CBn_RESUME in this order. Even in case of reported fault where context is not stalled, the first step, IMO, I see no reason why should be any different. I agree that delaying fault clearance can result in FSR.MULTI being set, but clearning fault before prevent clients to use SCTLR.HUPCF on subsequent transactions while they take any debug action. The client should be reported fault in the same state it occured. Please refer qcom_smmu_context_fault() for this sequence.
> > > + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, cfi.fsr); > > > + > > > ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova, > > > cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE > > > : IOMMU_FAULT_READ); > > > if (ret == -ENOSYS && __ratelimit(&rs)) > > > arm_smmu_print_context_fault_info(smmu, idx, &cfi); > > > - arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, cfi.fsr); > > > return IRQ_HANDLED; > > > } > > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h > > > b/drivers/iommu/arm/arm-smmu/arm-smmu.h > > > index > > > 2dbf3243b5ad2db01e17fb26c26c838942a491be..789c64ff3eb9944c8af37426e005241a8288da20 > > > 100644 > > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h > > > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h > > > @@ -216,7 +216,6 @@ enum arm_smmu_cbar_type { > > > ARM_SMMU_CB_FSR_TLBLKF) > > > #define ARM_SMMU_CB_FSR_FAULT (ARM_SMMU_CB_FSR_MULTI | > > > \ > > > - ARM_SMMU_CB_FSR_SS | \ > > Given writing to FSR.SS doesn't clear this bit but write to CBn_RESUME > > does, this seems right. This but can be taken as separate patch. > > This change on its own isn't really useful - all that would achieve is that > instead of constantly re-reporting the FSR.SS "fault", the interrupt goes > unhandled and the IRQ core ends up disabling it permanently. If anything > that's arguably worse, since the storm of context fault reports does at > least give a fairly clear indication of what's gone wrong, rather than > having to deduce the cause of an "irq n: nobody cared" message entirely by > code inspection. > Does spec allow or do we see reported fault with just FSR.SS bit. If answer is no then Keeping FSR_SS would be misleading. Here ARM_SMMU_CB_FSR_FAULT is used to clear fault bits or check valid faults. Also validity of this is not based on rest of the change. Thanks, Prakash > > > > > ARM_SMMU_CB_FSR_UUT | > > > \ > > > ARM_SMMU_CB_FSR_EF | > > > \ > > > ARM_SMMU_CB_FSR_PF | > > > \ > > > > > > -- > > > 2.47.1 > > > >