Hi Julien,

On 25/06/2025 16:53, Julien Grall wrote:

>Hi Jahan,

>>>> +    dsb(sy);
>>> Any clue why Linux (mainline) does not do that?

> One process remark, we typically comment inline rather than pasting a quote 
> and replying at the top of the e-mail.

 Thanks for the style note - I'll follow the inline commenting convention 
moving forward.

>> The implementation writel() which contains an implicit dsb(st) which likely 
>> sufficient for Linux for its Stage-1 IOMMU usage where CPU and IOMMU 
>> interactions are coherent.
>> However, Xen uses the IPMMU as a Stage-2 IOMMU for non-coherent DMA 
>> operations (such as PCIe passthrough), requiring the stronger dsb(sy) to 
>> ensure writes fully propagate to the IPMMU >>hardware before continuing.

> I don't follow. Are you saying the IPMMU driver in Linux doesn't non-coherent 
> DMA operations?

Let me clarify my understanding:  In native Linux, the IOMMU works at stage-1 
(VA -> PA) and typically assumes coherency between CPU and IOMMU. The implicit 
dsb(st) in writel() is enough there. But in Xen, we use this as stage-2 (GPA -> 
HPA) for cases like PCI passthrough where devices might be non-coherent. We 
might need stronger barrier dsb(sy) in xen because: 1) We can't assume the TLB 
walker is coherent for stage -2 and we must also prevent(minimise) any DMA 
operations during TLB invalidation( observed some IPMMU hardware limitations in 
the documentation) .

> But even if that's the case, I still don't see why non-coherent DMA would 
> matter. From my understanding, here we want to make sure the TLB walker sees 
> the change before the flush.
> So if the TLB walker is coherent with the rest of the system. Then it would 
> be similar to the CPU TLBs where we only need a "dsb st" (well we use "nshst" 
> because the TLB is in non-shareable domain).
> If the walker is not coherent, then that's a different topic.

You're correct that dsb(st) would suffice in an ideal coherent system. However, 
with PCI passthrough we must handle non-coherent devices. While dsb(st) ensures 
writes complete, dsb(sy) provides the stronger system-wide visibility we need - 
guaranteeing all components (including non-coherent devices) see the changes 
before proceeding.

> Anyway, I am not against using "dsb(sy)". It is stronger than necessary but 
> also probably not a massive deal in the TLB flush path.

Thank you. I agree the performance impact is negligible in the flush path, and 
it's better to be safe when dealing with passthrough devices in xen.

Regards,
Jahan Murudi

Reply via email to