Hi, Julien.
Sorry for the possible format issues.
> No, it is not disabled. But, ipmmu_irq() uses another
mmu->lock. So, I
> think, there won't be a deadlock.
>
> Or I really missed something?
>
> If we worry about ipmmu_tlb_invalidate() which is called
here (to
> perform a flush by request from P2M code, which manages a
page table)
> and from the irq handler (to perform a flush to resume
address
> translation), I could use a tasklet to schedule
ipmmu_tlb_invalidate()
> from the irq handler then. This way we would get this
serialized. What
> do you think?
I am afraid a tasklet is not an option. You need to perform
the TLB
flush when requested otherwise you are introducing a security
issue.
This is because as soon as a region is unmapped in the page
table, we
remove the drop the reference on any page backing that
region. When the
reference is dropped to zero, the page can be reallocated to
another
domain or even Xen. If the TLB flush happen after, then the
guest may
still be able to access the page for a short time if the
translation has
been cached by the any TLB (IOMMU, Processor).
I understand this. I am not proposing to delay a requested by P2M
code TLB flush in any case. I just propose to issue TLB flush
(which we have to perform in case of page faults, to resolve
error condition and resume translations) from a tasklet rather
than from interrupt handler directly. This is the TLB flush I am
speaking about:
https://github.com/otyshchenko1/xen/blob/ipmmu_upstream2/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L598
Sorry if I was unclear.
My mistake, I misread what you wrote.
I found the flush in the renesas-bsp and not Linux upstream but it
is not clear why this is actually required. You are not fixing any
translation error. So what this flush will do?
Regarding the placement of the flush, then if you execute in a
tasklet it will likely be done later on when the IRQ has been
acknowledge. What's the implication to delay it?
Looks like, there is no need to put this flush into a tasklet. As I
understand from Shimoda-san's answer it is OK to call flush here.
So, my worry about calling ipmmu_tlb_invalidate() directly from the
interrupt handler is not actual anymore.
----------
This is my understanding regarding the flush purpose here. This
code, just follows the TRM, no more no less,
which mentions about a need to flush TLB after clearing error
status register and updating a page table entries (which, I assume,
means to resolve a reason why translation/page fault error actually
have happened) to resume address translation request.
Well, I don't have the TRM... so my point of reference is Linux. Why
does upstream not do the TLB flush?
I have no idea regarding that. >
I have been told this is an errata on the IPMMU. Is it related to
the series posted on linux-iommu [1]?
I don't think, the TLB flush we are speaking about, is related to
that series [1] somehow. This TLB flush, I think, is just the last
step in a sequence of actions which should be performed when the
error occurs, no more no less. This is how I understand this.
If you have to flush the TLBs in the IRQ context then something has
gone really wrong.
I don't deny that Break-Before-Make is an issue. However, if it is
handled correctly in the P2M code. You should only be there because
there are no mapping in the TLBs for the address accessed. So flushing
the TLBs should be unnecessary, unless your TLB is also caching
invalid entry?
Sorry, I don't quite understand why we need to worry about this flush
too much for a case which won't occur in normal condition (if everything
is correct). Why we can't just consider this flush as a required action,
which needed to exit from the error state and resume stopped address
translation request. The same required action as "clearing error status
flags" before. We are not trying to understand, why is it so necessary
to clear error flags when error happens, or can we end up without
clearing it, for example. We just follow what described in document. The
same, I think, we have for that flush, if described, then should be
followed. Looks like this flush acts as a trigger to unblock stopped
transaction in that particular case.
Different H/W could have different restoring sequences. Some H/W
requires just clearing error status, other H/W requires full
re-initialization in a specific order to recover from the error state.
Please correct me if I am wrong.
But, with one remark, as you have already noted, we are not trying
to handle/fix this fault (update page table entries), driver
doesn't manage page table and is not aware what the page table is.
What is more, it is unclear what actually need to be fixed in the
page table which is a CPU page table as the same time.
I have heard there is a break-before-make sequence when updating
the page table. So, if device in a domain is issuing DMA somewhere
in the middle of updating a page table, theoretically, we might hit
into this fault. In this case the page table is correct and we
don't need to fix anything... Being honest, I have never seen a
fault caused by break-before-make sequence.
Ok, so it looks like you are trying to fix [1]. My first concern
here is there are no ground for someone without access to the TRM
why this is done.
No, I am definitely not trying to fix [1]. I just follow the BSP
driver I am based on, which in turn follows the TRM. I can extend a
comment in the code before calling ipmmu_tlb_invalidate().
The fact that the code is in the BSP and not in Linux is worrying me.
The commit message in the BSP is quite unhelpful to determine the
exact reason.
It either means Linux rejected the patch or this was not submitted.
Either way, this should be understood why such discrepancy.
I failed to find something similar in the ML. So, probably, was not
submitted. Hope, we will be able to clarify a reason.
--
Regards,
Oleksandr Tyshchenko
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel