On 08/08/2019 16:04, Oleksandr wrote:
Hi
Sorry for the possible format issues.
> No, it is not disabled. But, ipmmu_irq() uses another mmu->lock. So, I
> think, there won't be a deadlock.
>
> Or I really missed something?
>
> If we worry about ipmmu_tlb_invalidate() which is called here (to
> perform a flush by request from P2M code, which manages a page table)
> and from the irq handler (to perform a flush to resume address
> translation), I could use a tasklet to schedule ipmmu_tlb_invalidate()
> from the irq handler then. This way we would get this serialized. What
> do you think?
I am afraid a tasklet is not an option. You need to perform the TLB
flush when requested otherwise you are introducing a security issue.
This is because as soon as a region is unmapped in the page table, we
remove the drop the reference on any page backing that region. When the
reference is dropped to zero, the page can be reallocated to another
domain or even Xen. If the TLB flush happen after, then the guest may
still be able to access the page for a short time if the translation has
been cached by the any TLB (IOMMU, Processor).
I understand this. I am not proposing to delay a requested by P2M code TLB
flush in any case. I just propose to issue TLB flush (which we have to
perform in case of page faults, to resolve error condition and resume
translations) from a tasklet rather than from interrupt handler directly.
This is the TLB flush I am speaking about:
https://github.com/otyshchenko1/xen/blob/ipmmu_upstream2/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L598
Sorry if I was unclear.
My mistake, I misread what you wrote.
I found the flush in the renesas-bsp and not Linux upstream but it is not
clear why this is actually required. You are not fixing any translation
error. So what this flush will do?
Regarding the placement of the flush, then if you execute in a tasklet it
will likely be done later on when the IRQ has been acknowledge. What's the
implication to delay it?
Looks like, there is no need to put this flush into a tasklet. As I
understand from Shimoda-san's answer it is OK to call flush here.
So, my worry about calling ipmmu_tlb_invalidate() directly from the interrupt
handler is not actual anymore.
----------
This is my understanding regarding the flush purpose here. This code, just
follows the TRM, no more no less,
which mentions about a need to flush TLB after clearing error status register
and updating a page table entries (which, I assume, means to resolve a reason
why translation/page fault error actually have happened) to resume address
translation request.
Well, I don't have the TRM... so my point of reference is Linux. Why does
upstream not do the TLB flush?
I have no idea regarding that. >
I have been told this is an errata on the IPMMU. Is it related to the series
posted on linux-iommu [1]?
I don't think, the TLB flush we are speaking about, is related to that series
[1] somehow. This TLB flush, I think, is just the last step in a sequence of
actions which should be performed when the error occurs, no more no less. This
is how I understand this.
If you have to flush the TLBs in the IRQ context then something has gone really
wrong.
I don't deny that Break-Before-Make is an issue. However, if it is handled
correctly in the P2M code. You should only be there because there are no mapping
in the TLBs for the address accessed. So flushing the TLBs should be
unnecessary, unless your TLB is also caching invalid entry?
But, with one remark, as you have already noted, we are not trying to
handle/fix this fault (update page table entries), driver doesn't manage page
table and is not aware what the page table is. What is more, it is unclear
what actually need to be fixed in the page table which is a CPU page table as
the same time.
I have heard there is a break-before-make sequence when updating the page
table. So, if device in a domain is issuing DMA somewhere in the middle of
updating a page table, theoretically, we might hit into this fault. In this
case the page table is correct and we don't need to fix anything... Being
honest, I have never seen a fault caused by break-before-make sequence.
Ok, so it looks like you are trying to fix [1]. My first concern here is there
are no ground for someone without access to the TRM why this is done.
No, I am definitely not trying to fix [1]. I just follow the BSP driver I am
based on, which in turn follows the TRM. I can extend a comment in the code
before calling ipmmu_tlb_invalidate().
The fact that the code is in the BSP and not in Linux is worrying me. The commit
message in the BSP is quite unhelpful to determine the exact reason.
It either means Linux rejected the patch or this was not submitted. Either way,
this should be understood why such discrepancy.
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel