On 12/18/18 4:05 PM, Roger Pau Monne wrote:
> paging_log_dirty_op function takes mm locks from a subject domain and
> then attempts to perform copy to operations against the caller
> domain in order to copy the result of the hypercall into the caller
> provided buffer.
> 
> This works fine when the caller is a non-paging domain, but triggers a
> lock order panic when the caller is a paging domain due to the fact
> that at the point where the copy to operation is performed the subject
> domain paging lock is locked, and the copy operation requires locking
> the caller p2m lock which has a lower level.
> 
> Fix this limitation by adding a bias to the level of the caller domain
> mm locks, so that the lower caller domain mm lock always has a level
> greater than the higher subject domain lock level. This allows locking
> the subject domain mm locks and then locking the caller domain mm
> locks, while keeping the same lock ordering and the changes mostly
> confined to mm-locks.h.
> 
> Note that so far only this flow (locking a subject domain locks and
> then the caller domain ones) has been identified, but not all possible
> code paths have been inspected. Hence this solution attempts to be a
> non-intrusive fix for the problem at hand, without discarding further
> changes in the future if other valid code paths are found that require
> more complex lock level ordering.
> 
> Signed-off-by: Roger Pau Monné <roger....@citrix.com>

As a quick fix I think this general approach is OK; the thing I don't
like is that it's symmetric.  We don't *expect* to ever have a situation
where A grabs one of its own MM locks and then one of B's, *and* B then
grabs one of its own locks and then A's; but it could happen.

Since we've generally identified dom0 which may be grabbing locks of a
PVH stubdom, which may be grabbing logs of a normal domU, would it be
possible / make sense instead to give a 2x bonus for dom0, and a 1x
bonus for "is_priv_for" domains?

That way we enforce that all target locks are acquired before all
stubomain locks, and all stubdomain locks are acquired before all dom0
locks: we have a total ordering of locks (not counting between "peer" VMs).

I suspect, however, that at some point we'll discover some path where
the order will need to be reversed; at which point we'll need to figure
out something else.

I also think it would be good to start recording specific instances of
ordering requirements, so that we don't have to go track them down.
(It's not immediately obvious to me, for instance, why the paging lock
is so far down the list.)  Could you add somewhere into the comments in
this section something like this:

paging_log_dirty_op grabs tgt paging_lock, then does copy_to_user which
grabs dom0's P2M lock.

The other thing we could do is generate the lock level as (OWNER_TYPE <<
11) | (LOCK_TYPE << 5) | (domain_id), where OWNER_TYPE is 2 for dom0, 1
for stub domains, 0 for normal domains, and LOCK_TYPE is the current
lock level; and then fail if the new lock level >= current lock level.
That would flag up any potential inter-domain deadlocks as well.

Thoughts?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to