On 12/19/18 2:40 PM, Roger Pau Monné wrote:
> On Wed, Dec 19, 2018 at 12:38:50PM +0000, George Dunlap wrote:
>> On 12/19/18 12:10 PM, Roger Pau Monné wrote:
>>> On Wed, Dec 19, 2018 at 11:40:14AM +0000, George Dunlap wrote:
>>>> On 12/18/18 4:05 PM, Roger Pau Monne wrote:
>>>>> paging_log_dirty_op function takes mm locks from a subject domain and
>>>>> then attempts to perform copy to operations against the caller
>>>>> domain in order to copy the result of the hypercall into the caller
>>>>> provided buffer.
>>>>>
>>>>> This works fine when the caller is a non-paging domain, but triggers a
>>>>> lock order panic when the caller is a paging domain due to the fact
>>>>> that at the point where the copy to operation is performed the subject
>>>>> domain paging lock is locked, and the copy operation requires locking
>>>>> the caller p2m lock which has a lower level.
>>>>>
>>>>> Fix this limitation by adding a bias to the level of the caller domain
>>>>> mm locks, so that the lower caller domain mm lock always has a level
>>>>> greater than the higher subject domain lock level. This allows locking
>>>>> the subject domain mm locks and then locking the caller domain mm
>>>>> locks, while keeping the same lock ordering and the changes mostly
>>>>> confined to mm-locks.h.
>>>>>
>>>>> Note that so far only this flow (locking a subject domain locks and
>>>>> then the caller domain ones) has been identified, but not all possible
>>>>> code paths have been inspected. Hence this solution attempts to be a
>>>>> non-intrusive fix for the problem at hand, without discarding further
>>>>> changes in the future if other valid code paths are found that require
>>>>> more complex lock level ordering.
>>>>>
>>>>> Signed-off-by: Roger Pau Monné <roger....@citrix.com>
>>>>
>>>> As a quick fix I think this general approach is OK; the thing I don't
>>>> like is that it's symmetric.  We don't *expect* to ever have a situation
>>>> where A grabs one of its own MM locks and then one of B's, *and* B then
>>>> grabs one of its own locks and then A's; but it could happen.
>>>
>>> I have not identified such scenario ATM, but we cannot discard future
>>> features needing such interlocking I guess. In any case, I think this
>>> is something that would have to be solved when we came across such
>>> scenario IMO.
>>
>> Right -- and the purpose of these macros is to make sure that we
>> discover such potential deadlocks in testing rather than in production.
>>
>>>> Since we've generally identified dom0 which may be grabbing locks of a
>>>> PVH stubdom, which may be grabbing logs of a normal domU, would it be
>>>> possible / make sense instead to give a 2x bonus for dom0, and a 1x
>>>> bonus for "is_priv_for" domains?
>>>
>>> Jan pointed out such case, but I'm not sure I can see how this is
>>> supposedly to happen even given the scenario above, I have to admit
>>> however I'm not that familiar with the mm code, so it's likely I'm
>>> missing something.
>>>
>>> Hypercalls AFAIK have a single target (or subject) domain, so even if
>>> there's a stubdomain relation I'm not sure I see why that would
>>> require this kind of locking, any domain can perform hypercalls
>>> against a single subject domain, and the hypervisor itself doesn't
>>> even know about stubdomain relations.
>>
>> We're considering three potential cases:
>>
>> A. dom0 makes a hypercall w/ domU as a target.
>> B. dom0 makes a hypercall w/ stubdom as a target.
>> c. stubdom makes a hypercall w/ domU as a target.
>>
>> We could give only dom0 a bonus.  In that case, A and B would work, but
>> C might fail (since stubdom's lock values are the same as domU's).
>>
>> We could give both dom0 and stubdoms a bonus.  In that case, A and C
>> would work, but B might fail (since the stubdom's lock values are the
>> same as dom0's).
>>
>> Or, we could do what I've proposed: give stubdom a bonus, and dom0 a
>> double bonus.  That way all 3 work, since dom0's lock values !=
>> stubdom's lock values, and stubdom's lock values != domU's lock values.
>>
>> On the other hand, starting simple and adding things in as you find you
>> need them isn't a bad approach; so possibly just giving a bonus to dom0
>> is a good place to start.
> 
> IMO just giving a bonus to the caller domain (current->domain) is even
> easier. I've only spotted a single case where there's such
> interleaved domain locking, which is due to a copy_to into a caller
> provided buffer while having some subject's domain mm locks taken.
> 
> On the line of your reply below, I would leave more complex locking
> level adjustment to further patches if there's such a need.

Not sure how it's easier -- one is (current == d), the other is
is_control_domain(d).

Using 'current' means that potential deadlocks which would now cause a
BUG() won't anymore.  I'm fine with not adding extra protections that
aren't there now; but I don't want to remove protections that are.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to