On Mon Aug 12, 2024 at 11:25 AM AEST, Richard Henderson wrote: > On 8/9/24 17:47, Nicholas Piggin wrote: > > This is not a clean patch, but does fix a problem I hit with TB > > invalidation due to the target software writing to memory with TBs. > > > > Lockup messages are triggering in Linux due to page clearing taking a > > long time when a code page has been freed, because it takes a lot of > > notdirty notifiers, which massively slows things down. Linux might > > possibly have a bug here too because it seems to hang indefinitely in > > some cases, but even if it didn't, the latency of clearing these pages > > is very high. > > > > This showed when running KVM on the emulated machine, starting and > > stopping guests. That causes lots of instruction pages to be freed. > > Usually if you're just running Linux, executable pages remain in > > pagecache so you get fewer of these bombs in the kernel memory > > allocator. But page reclaim, JITs, deleting executable files, etc., > > could trigger it too. > > > > Invalidating all TBs from the page on any hit seems to avoid the problem > > and generally speeds things up. > > > > How important is the precise invalidation? These days I assume the > > tricky kind of SMC that frequently writes code close to where it's > > executing is pretty rare and might not be something we really care about > > for performance. Could we remove sub-page TB invalidation entirely? > > Happens on x86 and s390 regularly enough, so we can't remove it. > > > @@ -1107,6 +1107,9 @@ tb_invalidate_phys_page_range__locked(struct > > page_collection *pages, > > TranslationBlock *current_tb = retaddr ? tcg_tb_lookup(retaddr) : > > NULL; > > #endif /* TARGET_HAS_PRECISE_SMC */ > > > > + start &= TARGET_PAGE_MASK; > > + last |= ~TARGET_PAGE_MASK; > > + > > /* Range may not cross a page. */ > > tcg_debug_assert(((start ^ last) & TARGET_PAGE_MASK) == 0); > > This would definitely break SMC.
They can't invalidate the instruction currently being executed? I'll experiment a bit more. > However, there's a better solution. We're already iterating over all of the > TBs on the > current page only. Move *everything* except the tb_phys_invalidate__locked > call into the > SMC ifdef, and unconditionally invalidate every TB selected in the loop. Okay. I suspect *most* of the time even the strict SMC archs would not be writing to the same page they're executing either. But I can start with the !SMC. > We experimented with something like this for aarch64, which used to spend a > lot of the > kernel startup time invalidating code pages from the (somewhat bloated) EDK2 > bios. But it > turned out the bigger problem was address space randomization, and with > CF_PCREL the > problem appeared to go away. Interesting. > I don't think we've done any kvm-under-tcg performance testing, but lockup > messages would > certainly be something to look for... Yeah, actually Linux is throwing the messages a bit more recently at least on distros that enable page clearing at alloc for security, because that clearing is a big chunk that can happen in critical sections. Thanks for the suggestion, I'll give it a try. Thanks, Nick