On Mon Aug 12, 2024 at 11:25 AM AEST, Richard Henderson wrote:
> On 8/9/24 17:47, Nicholas Piggin wrote:
> > This is not a clean patch, but does fix a problem I hit with TB
> > invalidation due to the target software writing to memory with TBs.
> > 
> > Lockup messages are triggering in Linux due to page clearing taking a
> > long time when a code page has been freed, because it takes a lot of
> > notdirty notifiers, which massively slows things down. Linux might
> > possibly have a bug here too because it seems to hang indefinitely in
> > some cases, but even if it didn't, the latency of clearing these pages
> > is very high.
> > 
> > This showed when running KVM on the emulated machine, starting and
> > stopping guests. That causes lots of instruction pages to be freed.
> > Usually if you're just running Linux, executable pages remain in
> > pagecache so you get fewer of these bombs in the kernel memory
> > allocator. But page reclaim, JITs, deleting executable files, etc.,
> > could trigger it too.
> > 
> > Invalidating all TBs from the page on any hit seems to avoid the problem
> > and generally speeds things up.
> > 
> > How important is the precise invalidation? These days I assume the
> > tricky kind of SMC that frequently writes code close to where it's
> > executing is pretty rare and might not be something we really care about
> > for performance. Could we remove sub-page TB invalidation entirely?
>
> Happens on x86 and s390 regularly enough, so we can't remove it.
>
> > @@ -1107,6 +1107,9 @@ tb_invalidate_phys_page_range__locked(struct 
> > page_collection *pages,
> >       TranslationBlock *current_tb = retaddr ? tcg_tb_lookup(retaddr) : 
> > NULL;
> >   #endif /* TARGET_HAS_PRECISE_SMC */
> >   
> > +    start &= TARGET_PAGE_MASK;
> > +    last |= ~TARGET_PAGE_MASK;
> > +
> >       /* Range may not cross a page. */
> >       tcg_debug_assert(((start ^ last) & TARGET_PAGE_MASK) == 0);
>
> This would definitely break SMC.

They can't invalidate the instruction currently being executed?
I'll experiment a bit more.

> However, there's a better solution.  We're already iterating over all of the 
> TBs on the 
> current page only.  Move *everything* except the tb_phys_invalidate__locked 
> call into the 
> SMC ifdef, and unconditionally invalidate every TB selected in the loop.

Okay. I suspect *most* of the time even the strict SMC archs would
not be writing to the same page they're executing either. But I can
start with the !SMC.

> We experimented with something like this for aarch64, which used to spend a 
> lot of the 
> kernel startup time invalidating code pages from the (somewhat bloated) EDK2 
> bios.  But it 
> turned out the bigger problem was address space randomization, and with 
> CF_PCREL the 
> problem appeared to go away.

Interesting.

> I don't think we've done any kvm-under-tcg performance testing, but lockup 
> messages would 
> certainly be something to look for...

Yeah, actually Linux is throwing the messages a bit more recently
at least on distros that enable page clearing at alloc for security,
because that clearing is a big chunk that can happen in critical
sections.

Thanks for the suggestion, I'll give it a try.

Thanks,
Nick

Reply via email to