On 26/02/2025 4:06 pm, Jan Beulich wrote: > On 26.02.2025 17:04, Roger Pau Monné wrote: >> On Wed, Feb 26, 2025 at 03:36:33PM +0100, Jan Beulich wrote: >>> On 26.02.2025 15:28, Roger Pau Monné wrote: >>>> On Wed, Feb 26, 2025 at 03:08:33PM +0100, Jan Beulich wrote: >>>>> On 26.02.2025 14:56, Roger Pau Monné wrote: >>>>>> On Mon, Feb 24, 2025 at 01:27:24PM +0000, Alejandro Vallejo wrote: >>>>>>> --- a/xen/common/page_alloc.c >>>>>>> +++ b/xen/common/page_alloc.c >>>>>>> @@ -490,13 +490,11 @@ static long outstanding_claims; /* total >>>>>>> outstanding claims by all domains */ >>>>>>> >>>>>>> unsigned long domain_adjust_tot_pages(struct domain *d, long pages) >>>>>>> { >>>>>>> - long dom_before, dom_after, dom_claimed, sys_before, sys_after; >>>>>>> - >>>>>>> ASSERT(rspin_is_locked(&d->page_alloc_lock)); >>>>>>> d->tot_pages += pages; >>>>>>> >>>>>>> /* >>>>>>> - * can test d->claimed_pages race-free because it can only change >>>>>>> + * can test d->outstanding_pages race-free because it can only >>>>>>> change >>>>>>> * if d->page_alloc_lock and heap_lock are both held, see also >>>>>>> * domain_set_outstanding_pages below >>>>>>> */ >>>>>>> @@ -504,17 +502,16 @@ unsigned long domain_adjust_tot_pages(struct >>>>>>> domain *d, long pages) >>>>>>> goto out; >>>>>> I think you can probably short-circuit the logic below if pages == 0? >>>>>> (and avoid taking the heap_lock) >>>>> Are there callers passing in 0? >>>> Not sure, but if there are no callers expected we might add an ASSERT >>>> to that effect then. >>>> >>>>>>> spin_lock(&heap_lock); >>>>>>> - /* adjust domain outstanding pages; may not go negative */ >>>>>>> - dom_before = d->outstanding_pages; >>>>>>> - dom_after = dom_before - pages; >>>>>>> - BUG_ON(dom_before < 0); >>>>>>> - dom_claimed = dom_after < 0 ? 0 : dom_after; >>>>>>> - d->outstanding_pages = dom_claimed; >>>>>>> - /* flag accounting bug if system outstanding_claims would go >>>>>>> negative */ >>>>>>> - sys_before = outstanding_claims; >>>>>>> - sys_after = sys_before - (dom_before - dom_claimed); >>>>>>> - BUG_ON(sys_after < 0); >>>>>>> - outstanding_claims = sys_after; >>>>>>> + BUG_ON(outstanding_claims < d->outstanding_pages); >>>>>>> + if ( pages > 0 && d->outstanding_pages < pages ) >>>>>>> + { >>>>>>> + /* `pages` exceeds the domain's outstanding count. Zero it >>>>>>> out. */ >>>>>>> + outstanding_claims -= d->outstanding_pages; >>>>>>> + d->outstanding_pages = 0; >>>>>>> + } else { >>>>>>> + outstanding_claims -= pages; >>>>>>> + d->outstanding_pages -= pages; >>>>>> I wonder if it's intentional for a pages < 0 value to modify >>>>>> outstanding_claims and d->outstanding_pages, I think those values >>>>>> should only be set from domain_set_outstanding_pages(). >>>>>> domain_adjust_tot_pages() should only decrease the value, but never >>>>>> increase either outstanding_claims or d->outstanding_pages. >>>>>> >>>>>> At best the behavior is inconsistent, because once >>>>>> d->outstanding_pages reaches 0 there will be no further modification >>>>>> from domain_adjust_tot_pages(). >>>>> Right, at that point the claim has run out. While freeing pages with an >>>>> active claim means that the claim gets bigger (which naturally needs >>>>> reflecting in the global). >>>> domain_adjust_tot_pages() is not exclusively called when freeing >>>> pages, see steal_page() for example. >>> Or also when pages were allocated. steal_page() ... >>> >>>> When called from steal_page() it's wrong to increase the claim, as >>>> it assumes that the page removed from d->tot_pages is freed, but >>>> that's not the case. The domain might end up in a situation where >>>> the claim is bigger than the available amount of memory. >>> ... is a case that likely wasn't considered when the feature was added. >>> >>> I never really liked this; I'd be quite happy to see it ripped out, as >>> long as we'd be reasonably certain it isn't in active use by people. >> What do you mean with 'it' in the above sentence, the whole claim >> stuff? > Yes. > >> Or just getting rid of allowing the claim to increase as a >> result of pages being removed from a domain? > No.
Alejandro and I discussed this earlier in the week. The claim infrastructure stuff is critical for a toolstack capable of doing things in parallel. However, it is also nonsensical for there to be a remaining claim by the time domain construction is done. If creation_finished were a concrete thing, rather than a bodge hacked into domain_unpause_by_systemcontroller(), it ought to be made to fail if there were an outstanding claim. I suggested that we follow through on a previous suggestion of making it a real hypercall (which is needed by the encrypted VM crowd too). ~Andrew