On 22.02.2021 18:11, Andrew Cooper wrote:
> On 22/02/2021 16:49, Jan Beulich wrote:
>> On 22.02.2021 16:26, Andrew Cooper wrote:
>>> At the moment, attempting to create an HVM guest with max_gnttab_frames of 0
>>> causes Xen to explode on the:
>>>
>>>   BUG_ON(atomic_read(&d->refcnt) != DOMAIN_DESTROYED);
>>>
>>> in _domain_destroy().  Intrumenting Xen a little more to highlight where the
>>> modifcations to d->refcnt occur:
>>>
>>>   (d6) --- Xen Test Framework ---
>>>   (d6) Environment: PV 64bit (Long mode 4 levels)
>>>   (d6) Testing domain create:
>>>   (d6) Testing x86 PVH Shadow
>>>   (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d0402046b5 
>>> [domain_create+0x1c3/0x7f1], stk e010:ffff83003fea7d58, dr6 ffff0ff1
>>>   (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d040321b11 
>>> [share_xen_page_with_guest+0x175/0x190], stk e010:ffff83003fea7ce8, dr6 
>>> ffff0ff1
>>>   (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d04022595b 
>>> [assign_pages+0x223/0x2b7], stk e010:ffff83003fea7c68, dr6 ffff0ff1
>>>   (d6) (XEN) grant_table.c:1934: Bad grant table sizes: grant 0, maptrack 0
>>>   (d6) (XEN) *** d1 ref 3
>>>   (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d0402048bc 
>>> [domain_create+0x3ca/0x7f1], stk e010:ffff83003fea7d58, dr6 ffff0ff1
>>>   (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d040225e11 
>>> [free_domheap_pages+0x422/0x44a], stk e010:ffff83003fea7c38, dr6 ffff0ff1
>>>   (d6) (XEN) Xen BUG at domain.c:450
>>>   (d6) (XEN) ----[ Xen-4.15-unstable  x86_64  debug=y  Not tainted ]----
>>>   (d6) (XEN) CPU:    0
>>>   (d6) (XEN) RIP:    e008:[<ffff82d040204366>] 
>>> common/domain.c#_domain_destroy+0x69/0x6b
>>>
>>> the problem becomes apparent.
>>>
>>> First of all, there is a reference count leak - 
>>> share_xen_page_with_guest()'s
>>> reference isn't freed anywhere.
>>>
>>> However, the main problem is the 4th #DB above is this atomic_set()
>>>
>>>   d->is_dying = DOMDYING_dead;
>>>   if ( hardware_domain == d )
>>>       hardware_domain = old_hwdom;
>>>   printk("*** %pd ref %d\n", d, atomic_read(&d->refcnt));
>>>   atomic_set(&d->refcnt, DOMAIN_DESTROYED);
>>>
>>> in the domain_create() error path, which happens before free_domheap_pages()
>>> drops the ref acquired assign_pages(), and destroys still-relevant 
>>> information
>>> pertaining to the guest.
>> I think the original idea was that by the time of the atomic_set()
>> all operations potentially altering the refcount are done. This
>> then allowed calling free_xenheap_pages() on e.g. the shared info
>> page without regard to whether the page's reference (installed by
>> share_xen_page_with_guest()) was actually dropped (i.e.
>> regardless of whether it's the domain create error path or proper
>> domain cleanup). With this assumption, no actual leak of anything
>> would occur, but of course this isn't the "natural" way of how
>> things should get cleaned up.
>>
>> According to this original model, free_domheap_pages() may not be
>> called anymore past that point (for domain owned pages, which
>> really means put_page() then; anonymous pages are of course fine
>> to be freed late).
> 
> And I presume this is written down in the usual place?  (I.e. nowhere)

I'm afraid so, hence me starting the explanation with "I think ...".

>>> The best options is probably to use atomic_sub() to subtract 
>>> (DOMAIN_DESTROYED
>>> + 1) from the current refcount, which preserves the extra refs taken by
>>> share_xen_page_with_guest() and assign_pages() until they can be freed
>>> appropriately.
>> First of all - why DOMAIN_DESTROYED+1? There's no extra reference
>> you ought to be dropping here. Or else what's the counterpart of
>> acquiring the respective reference?
> 
> The original atomic_set(1) needs dropping (somewhere around) here.

Ah, right.

>> And then of course this means Xen heap pages cannot be cleaned up
>> anymore by merely calling free_xenheap_pages() - to get rid of
>> the associated reference, all of them would need to undergo
>> put_page_alloc_ref(), which in turn requires obtaining an extra
>> reference, which in turn introduces another of these ugly
>> theoretical error cases (because get_page() can in principle fail).
>>
>> Therefore I wouldn't outright discard the option of sticking to
>> the original model. It would then better be properly described
>> somewhere, and we would likely want to put some check in place to
>> make sure such put_page() can't go unnoticed anymore when sitting
>> too late on the cleanup path (which may be difficult to arrange
>> for).
> 
> I agree that some rules are in desperate need of writing down.
> 
> However, given the catastrophic mess that is our reference counting and
> cleanup paths, and how easy it is to get things wrong, I'm very
> disinclined to permit a rule which forces divergent cleanup logic.
> 
> Making the cleanup paths non-divergent is the fix to removing swathes of
> dubious/buggy logic, and removing a steady stream of memory leaks, etc.
> 
> In particular, I don't think its acceptable to special case the cleanup
> rules in the domain_create() error path simply because we blindly reset
> the reference count when it still contains real information.

Of course I agree in principle. The question is whether this is a
good time for such a more extensive rework. IOW I wonder if there's
an immediate bug to be fixed now and then some rework to be done
for 4.16.

Jan

Reply via email to