On Mon, Feb 24, 2025 at 02:49:48PM +0000, Alejandro Vallejo wrote: > Open question to whoever reviews this... > > On Mon Feb 24, 2025 at 1:27 PM GMT, Alejandro Vallejo wrote: > > spin_lock(&heap_lock); > > - /* adjust domain outstanding pages; may not go negative */ > > - dom_before = d->outstanding_pages; > > - dom_after = dom_before - pages; > > - BUG_ON(dom_before < 0); > > - dom_claimed = dom_after < 0 ? 0 : dom_after; > > - d->outstanding_pages = dom_claimed; > > - /* flag accounting bug if system outstanding_claims would go negative > > */ > > - sys_before = outstanding_claims; > > - sys_after = sys_before - (dom_before - dom_claimed); > > - BUG_ON(sys_after < 0); > > - outstanding_claims = sys_after; > > + BUG_ON(outstanding_claims < d->outstanding_pages); > > + if ( pages > 0 && d->outstanding_pages < pages ) > > + { > > + /* `pages` exceeds the domain's outstanding count. Zero it out. */ > > + outstanding_claims -= d->outstanding_pages; > > + d->outstanding_pages = 0; > > While this matches the previous behaviour, do we _really_ want it? It's weird, > quirky, and it hard to extend to NUMA-aware claims (which is something in > midway through). > > Wouldn't it make sense to fail the allocation (earlier) if the claim has run > out? Do we even expect this to ever happen this late in the allocation call > chain?
I'm unsure. This is the case where more memory than initially claimed has been allocated, but by the time domain_adjust_tot_pages() gets called the memory has already been allocated, so it's kind of unhelpful to fail by then. I think any caller that requests more memory than what has been initially claimed for the domain should be prepared to deal with such allocation failing. This quirky handling is very likely a workaround for the miscellaneous differences between the memory accounted by the toolstack for a guest vs the memory really used by such guest. I bet if you limit a guest to strictly only allocate up to d->outstanding_pages domain creation will fail. In general the toolstack memory calculations are not fully accurate, see for example how vmx_alloc_vlapic_mapping() allocates a domheap page which very likely the toolstack won't have accounted for. There are likely other examples that would possibly break the accounting done by the toolstack. Thanks, Roger.