On Wed Mar 5, 2025 at 10:49 AM GMT, Jan Beulich wrote: > On 27.02.2025 15:36, Alejandro Vallejo wrote: > > On Wed Feb 26, 2025 at 2:05 PM GMT, Jan Beulich wrote: > >> On 24.02.2025 15:49, Alejandro Vallejo wrote: > >>> Open question to whoever reviews this... > >>> > >>> On Mon Feb 24, 2025 at 1:27 PM GMT, Alejandro Vallejo wrote: > >>>> spin_lock(&heap_lock); > >>>> - /* adjust domain outstanding pages; may not go negative */ > >>>> - dom_before = d->outstanding_pages; > >>>> - dom_after = dom_before - pages; > >>>> - BUG_ON(dom_before < 0); > >>>> - dom_claimed = dom_after < 0 ? 0 : dom_after; > >>>> - d->outstanding_pages = dom_claimed; > >>>> - /* flag accounting bug if system outstanding_claims would go > >>>> negative */ > >>>> - sys_before = outstanding_claims; > >>>> - sys_after = sys_before - (dom_before - dom_claimed); > >>>> - BUG_ON(sys_after < 0); > >>>> - outstanding_claims = sys_after; > >>>> + BUG_ON(outstanding_claims < d->outstanding_pages); > >>>> + if ( pages > 0 && d->outstanding_pages < pages ) > >>>> + { > >>>> + /* `pages` exceeds the domain's outstanding count. Zero it out. > >>>> */ > >>>> + outstanding_claims -= d->outstanding_pages; > >>>> + d->outstanding_pages = 0; > >>> > >>> While this matches the previous behaviour, do we _really_ want it? It's > >>> weird, > >>> quirky, and it hard to extend to NUMA-aware claims (which is something in > >>> midway through). > >>> > >>> Wouldn't it make sense to fail the allocation (earlier) if the claim has > >>> run > >>> out? Do we even expect this to ever happen this late in the allocation > >>> call > >>> chain? > >> > >> This goes back to what a "claim" means. Even without any claim, a domain > >> may > >> allocate memory. So a claim having run out doesn't imply allocation has to > >> fail. > > > > Hmmm... but that violates the purpose of the claim infra as far as I > > understand > > it. If a domain may overallocate by (e.g) ballooning in memory it can > > distort the > > ability of another domain to start up, even if it succeeded in its own > > claim. > > Why would that be? As long as we hold back enough memory to cover the claim, > it > shouldn't matter what kind of allocation we want to process. I'd say that a PV > guest starting ballooned ought to be able to deflate its balloon as far as > there was a claim established for it up front.
The fact a domain allocated something does not mean it had it claimed before. A claim is a restriction to others' ability to allocate/claim, not to the domain that made the claim. e.g: 0. host is idle. No domU. 1. dom1 is created with a claim to 10% of host memory and max_mem of 80% of host meomory. 2. dom1 balloons in 70% of host memory. 3. dom1 ballons out 70% of host memory. 4. dom1 now holds a a claim to 80% of host memory. It's all quite perverse. Fortunately, looking at adjacent claims-related code xl seems to default to making a claim prior to populating the physmap and cancelling the claim at the end of the meminit() hook so this is never a real problem. This tells me that the logic intent is to force early failure of populate_physmap and nothing else. It's never active by the time ballooning or memory exchange matter at all. Xen ought to cancel the claim by itself though, toolstack should not NEED to do it. And furthermore, the fact that the claim goes up and down interacts really poorly with the per-node claims I want to implement, because it's just impossible to know if a freed page actually comes from a prior decrease of a claim or not. > > > We might also break the invariant that total claims are strictly >= > > total_avail_pages. > > Same here - I don't see why this would happen as long as all accounting is > working correctly. See previous example. > > > I'm somewhat puzzled at the "why" of having separate concepts for max_mem > > and > > claims. I guess it simply grew the way it did. Reinstating sanity would > > probably involve making max_mem effectively the claim, but that's a ton of > > work I really would rather not do for now. > > To me the two are different (beyond claim being global while max-mem is per- > domain). max-mem is a hard boundary (beyond which allocations _will_ fail), > whereas a claim is a softer one, beyond which allocations _may_ fail. > > Jan What I meant is that I'd rather have a non-oversubscribed memory model by which I can statically partition my memory and make 100% sure the sum of all max_mem of every existing domain never exceeds host capacity. But this is neither the intent of the existing claim infra nor what I'm after right now. I was just thinking out loud in text form. :) Cheers, Alejandro