arm-smmu: Fix spurious interrupts with stall-on-fault

Prakash Gupta Thu, 23 Jan 2025 09:46:15 -0800

On Thu, Jan 23, 2025 at 11:51:27AM +0000, Robin Murphy wrote:
> On 2025-01-23 11:10 am, Prakash Gupta wrote:
> > On Wed, Jan 22, 2025 at 03:00:58PM -0500, Connor Abbott wrote:
> > > + /*
> > > +  * The SMMUv2 architecture specification says that if stall-on-fault is
> > > +  * enabled the correct sequence is to write to SMMU_CBn_FSR to clear
> > > +  * the fault and then write to SMMU_CBn_RESUME. Clear the interrupt
> > > +  * first before running the user's fault handler to make sure we follow
> > > +  * this sequence. It should be ok if there is another fault in the
> > > +  * meantime because we have already read the fault info.
> > > +  */
> > The context would remain stalled till we write to CBn_RESUME. Which is done
> > in qcom_adreno_smmu_resume_translation(). For a stalled context further
> > transactions are not processed and we shouldn't see further faults and
> > or fault inerrupts. Do you observe faults with stalled context?
> 
> This aspect isn't exclusive to stalled contexts though - even for "normal"
> terminated faults, clearing the FSR as soon as we've sampled all the
> associated fault registers is no bad thing, since if a second fault does
> occur while we're still reporting the first, we're then more likely to get a
> full syndrome rather than just the FSR.MULTI bit.
> 
ARM SMMUv2 spec recommends, in case of reported fault sw should first
correct the condition which casued the fault, I would interpret this as
reporting fault to client using callback, and then write CBn_FSR and
CBn_RESUME in this order. Even in case of reported fault where context is
not stalled, the first step, IMO, I see no reason why should be any
different.  I agree that delaying fault clearance can result in FSR.MULTI
being set, but clearning fault before  prevent clients to use SCTLR.HUPCF
on subsequent transactions while they take any debug action. The client
should be reported fault in the same state it occured. Please refer
qcom_smmu_context_fault() for this sequence.


> > > + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, cfi.fsr);
> > > +
> > >           ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova,
> > >                   cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE 
> > > : IOMMU_FAULT_READ);
> > >           if (ret == -ENOSYS && __ratelimit(&rs))
> > >                   arm_smmu_print_context_fault_info(smmu, idx, &cfi);
> > > - arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, cfi.fsr);
> > >           return IRQ_HANDLED;
> > >   }
> > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
> > > b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > > index 
> > > 2dbf3243b5ad2db01e17fb26c26c838942a491be..789c64ff3eb9944c8af37426e005241a8288da20
> > >  100644
> > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > > @@ -216,7 +216,6 @@ enum arm_smmu_cbar_type {
> > >                                            ARM_SMMU_CB_FSR_TLBLKF)
> > >   #define ARM_SMMU_CB_FSR_FAULT           (ARM_SMMU_CB_FSR_MULTI |        
> > > \
> > > -                                  ARM_SMMU_CB_FSR_SS |           \
> > Given writing to FSR.SS doesn't clear this bit but write to CBn_RESUME
> > does, this seems right. This but can be taken as separate patch.
> 
> This change on its own isn't really useful - all that would achieve is that
> instead of constantly re-reporting the FSR.SS "fault", the interrupt goes
> unhandled and the IRQ core ends up disabling it permanently. If anything
> that's arguably worse, since the storm of context fault reports does at
> least give a fairly clear indication of what's gone wrong, rather than
> having to deduce the cause of an "irq n: nobody cared" message entirely by
> code inspection.
> 
Does spec allow or do we see reported fault with just FSR.SS bit. If answer
is no then Keeping FSR_SS would be misleading. Here ARM_SMMU_CB_FSR_FAULT
is used to clear fault bits or check valid faults. Also validity of this
is not based on rest of the change. 

Thanks,
Prakash
 
> > 
> > >                                            ARM_SMMU_CB_FSR_UUT |          
> > > \
> > >                                            ARM_SMMU_CB_FSR_EF |           
> > > \
> > >                                            ARM_SMMU_CB_FSR_PF |           
> > > \
> > > 
> > > -- 
> > > 2.47.1
> > > 
>

Re: [PATCH v3 1/3] iommu/arm-smmu: Fix spurious interrupts with stall-on-fault

Reply via email to