On 22.06.2020 11:31, Roger Pau Monné wrote:
> On Fri, Jun 19, 2020 at 04:06:55PM +0200, Jan Beulich wrote:
>> On 18.06.2020 18:04, Roger Pau Monne wrote:
>>> Commit e9aca9470ed86 introduced a regression when avoiding sending
>>> IPIs for certain flush operations. Xen page fault handler
>>> (spurious_page_fault) relies on blocking interrupts in order to
>>> prevent handling TLB flush IPIs and thus preventing other CPUs from
>>> removing page tables pages. Switching to assisted flushing avoided such
>>> IPIs, and thus can result in pages belonging to the page tables being
>>> removed (and possibly re-used) while __page_fault_type is being
>>> executed.
>>>
>>> Force some of the TLB flushes to use IPIs, thus avoiding the assisted
>>> TLB flush. Those selected flushes are the page type change (when
>>> switching from a page table type to a different one, ie: a page that
>>> has been removed as a page table) and page allocation. This sadly has
>>> a negative performance impact on the pvshim, as less assisted flushes
>>> can be used.
>>>
>>> Introduce a new flag (FLUSH_FORCE_IPI) and helper to force a TLB flush
>>> using an IPI (flush_tlb_mask_sync). Note that the flag is only
>>> meaningfully defined when the hypervisor supports PV mode, as
>>> otherwise translated domains are in charge of their page tables and
>>> won't share page tables with Xen, thus not influencing the result of
>>> page walks performed by the spurious fault handler.
>>
>> Is this true for shadow mode? If a page shadowing a guest one was
>> given back quickly enough to the allocator and then re-used, I think
>> the same situation could in principle arise.
> 
> Hm, I think it's not applicable to HVM shadow mode at least, because
> CR3 is switched as part of vmentry/vmexit, and the page tables are not
> shared between Xen and the guest, so there's no way for a HVM shadow
> guest to modify the page-tables while Xen is walking them in
> spurious_page_fault (note spurious_page_fault is only called when the
> fault happens in non-guest context).

I'm afraid I disagree, because of shadow's use of "linear page tables".

>>> Note the flag is not defined on Arm, and the introduced helper is just
>>> a dummy alias to the existing flush_tlb_mask.
>>>
>>> Fixes: e9aca9470ed86 ('x86/tlb: use Xen L0 assisted TLB flush when 
>>> available')
>>> Reported-by: Andrew Cooper <andrew.coop...@citrix.com>
>>> Signed-off-by: Roger Pau Monné <roger....@citrix.com>
>>> ---
>>> It's my understanding that not doing such IPI flushes could lead to
>>> the pages tables being read by __page_fault_type being modified by a
>>> third party, which could make them point to other mfns out of the
>>> scope of the guest allowed physical memory addresses. However those
>>> accesses would be limited to __page_fault_type, and hence the main
>>> worry would be that a guest could make it point to read from a
>>> physical memory region that has side effects?
>>
>> I don't think so, no - the memory could be changed such that the
>> PTEs are invalid altogether (like having reserved bits set). Consider
>> for example the case of reading an MFN out of such a PTE that's larger
>> than the physical address width supported by the CPU. Afaict
>> map_domain_page() will happily install a respective page table entry,
>> but we'd get a reserved-bit #PF upon reading from that mapping.
> 
> So there are no hazards from executing __page_fault_type against a
> page-table that could be modified by a user?

There are - I realize only now that the way I started my earlier
reply was ambiguous. I meant to negate your "only with side effects"
way of thinking.

Jan

Reply via email to