On 14.09.2022 11:13, Roger Pau Monné wrote:
> On Wed, Sep 14, 2022 at 10:31:34AM +0200, Jan Beulich wrote:
>> On 14.09.2022 10:14, Jan Beulich wrote:
>>> On 13.09.2022 16:50, Roger Pau Monné wrote:
>>>> On Mon, Dec 13, 2021 at 04:12:55PM +0100, Jan Beulich wrote:
>>>>> show_hvm_stack() requires interrupts to be enabled to avoids triggering
>>>>> the consistency check in check_lock() for the p2m lock. To do so in
>>>>> spurious_interrupt() requires adding reentrancy protection / handling
>>>>> there.
>>>>
>>>> There's also an ASSERT(!in_irq()) in _percpu_write_lock() that will
>>>> trigger when trying to acquire the p2m lock from spurious_interrupt()
>>>> context, as p2m_lock() -> mm_write_lock() -> _mm_write_lock ->
>>>> percpu_write_lock().
>>>
>>> s/will/may/ since spurious_interrupt() doesn't itself use irq_enter(),
> 
> do_IRQ() does call irq_enter(), and that's the caller of
> spurious_interrupt() AFAICT.

Hmm, you're right. I was mislead by smp_call_function_interrupt()
explicitly using irq_{enter,exit}(). I guess that should have been
removed in b57458c1d02b ("x86: All vectored interrupts go through
do_IRQ()"). I guess I need to either open-code the variant of in_irq()
I'd need, or (perhaps better for overall state) explicitly irq_exit()
before the check and irq_enter() after the call. Thoughts?

>>> but yes - we could nest inside a lower priority interrupt. I'll make
>>> local_irq_enable() depend on !in_irq().
>>
>> Upon further thought I guess more precautions are necessary: We might
>> have interrupted code holding the P2M lock already, and we might also
>> have interrupted code holding another MM lock precluding acquiring of
>> the P2M lock. All of this probably plays into Andrew's concerns, yet
>> still I don't view it as a viable route to omit the stack dump for HVM
>> domains, and in particular for PVH Dom0. Sadly I can't think of any
>> better approach ...
> 
> Yes, I also had those concerns.  The mm locks are recursive, but
> spurious_interrupt() hitting in the middle of code already holding any
> mm lock is likely to end up triggering the mm lock order checker.

Guarding against this is possible, while ...

> One (likely very risky option ATM) is to introduce a per pCPU flag
> that when set will turn all mm locks into noops, and use it here in
> order to avoid any locking issues.  This could introduce two issues at
> least: first one is how resilient page walking routines are against
> page tables changing under their feet.  The second one is that any
> page table walker p2m helper should avoid doing modifications to the
> p2m, so no P2M_ALLOC or P2M_UNSHARE flags could be used.

... personally I view this as too risky.

Jan

Reply via email to