On Mon, Jun 3, 2024 at 4:03 PM Sean Christopherson <sea...@google.com> wrote:
>
> On Mon, Jun 03, 2024, James Houghton wrote:
> > On Thu, May 30, 2024 at 11:06 PM Yu Zhao <yuz...@google.com> wrote:
> > > What I don't think is acceptable is simplifying those optimizations
> > > out without documenting your justifications (I would even call it a
> > > design change, rather than simplification, from v3 to v4).
> >
> > I'll put back something similar to what you had before (like a
> > test_clear_young() with a "fast" parameter instead of "bitmap"). I
> > like the idea of having a new mmu notifier, like
> > fast_test_clear_young(), while leaving test_young() and clear_young()
> > unchanged (where "fast" means "prioritize speed over accuracy").
>
> Those two statements are contradicting each other, aren't they?

I guess it depends on how you define "similar". :)

> Anyways, I vote
> for a "fast only" variant, e.g. test_clear_young_fast_only() or so.  gup() has
> already established that terminology in mm/, so hopefully it would be familiar
> to readers.  We could pass a param, but then the MGLRU code would likely end 
> up
> doing a bunch of useless indirect calls into secondary MMUs, whereas a 
> dedicated
> hook allows implementations to nullify the pointer if the API isn't supported
> for whatever reason.
>
> And pulling in Oliver's comments about locking, I think it's important that 
> the
> mmu_notifier API express it's requirement that the operation be "fast", not 
> that
> it be lockless.  E.g. if a secondary MMU can guarantee that a lock will be
> contented only in rare, slow cases, then taking a lock is a-ok.  Or a 
> secondary
> MMU could do try-lock and bail if the lock is contended.
>
> That way KVM can honor the intent of the API with an implementation that works
> best for KVM _and_ for MGRLU.  I'm sure there will be future adjustments and 
> fixes,
> but that's just more motivation for using something like "fast only" instead 
> of
> "lockless".

Yes, thanks, this is exactly what I meant. I really should have "only"
in the name to signify that it is a requirement that it be fast.
Thanks for wording it so clearly.

>
> > > > I made this logic change as part of removing batching.
> > > >
> > > > I'd really appreciate guidance on what the correct thing to do is.
> > > >
> > > > In my mind, what would work great is: by default, do aging exactly
> > > > when KVM can do it locklessly, and then have a Kconfig to always have
> > > > MGLRU to do aging with KVM if a user really cares about proactive
> > > > reclaim (when the feature bit is set). The selftest can check the
> > > > Kconfig + feature bit to know for sure if aging will be done.
> > >
> > > I still don't see how that Kconfig helps. Or why the new static branch
> > > isn't enough?
> >
> > Without a special Kconfig, the feature bit just tells us that aging
> > with KVM is possible, not that it will necessarily be done. For the
> > self-test, it'd be good to know exactly when aging is being done or
> > not, so having a Kconfig like LRU_GEN_ALWAYS_WALK_SECONDARY_MMU would
> > help make the self-test set the right expectations for aging.
> >
> > The Kconfig would also allow a user to know that, no matter what,
> > we're going to get correct age data for VMs, even if, say, we're using
> > the shadow MMU.
>
> Heh, unless KVM flushes, you won't get "correct" age data.
>
> > This is somewhat important for me/Google Cloud. Is that reasonable? Maybe
> > there's a better solution.
>
> Hmm, no?  There's no reason to use a Kconfig, e.g. if we _really_ want to 
> prioritize
> accuracy over speed, then a KVM (x86?) module param to have KVM walk nested 
> TDP
> page tables would give us what we want.
>
> But before we do that, I think we need to perform due dilegence (or provide 
> data)
> showing that having KVM take mmu_lock for write in the "fast only" API 
> provides
> better total behavior.  I.e. that the additional accuracy is indeed worth the 
> cost.

That sounds good to me. I'll drop the Kconfig. I'm not really sure
what to do about the self-test, but that's not really all that
important.

Reply via email to