On 12/14/20 2:23 PM, Rebecca Cran wrote: > ARMv8.4 adds the mandatory FEAT_TLBIRANGE, which provides instructions > for invalidating ranges of entries. > > Signed-off-by: Rebecca Cran <rebe...@nuviainc.com> > --- > accel/tcg/cputlb.c | 24 ++ > include/exec/exec-all.h | 39 +++ > target/arm/helper.c | 273 ++++++++++++++++++++ > 3 files changed, 336 insertions(+) > > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c > index 42ab79c1a582..103f363b42f3 100644 > --- a/accel/tcg/cputlb.c > +++ b/accel/tcg/cputlb.c > @@ -603,6 +603,30 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr) > tlb_flush_page_by_mmuidx(cpu, addr, ALL_MMUIDX_BITS); > } > > +void tlb_flush_page_range_by_mmuidx(CPUState *cpu, target_ulong addr, > + int num_pages, uint16_t idxmap) > +{ > + int i; > + > + for (i = 0; i < num_pages; i++) { > + tlb_flush_page_by_mmuidx(cpu, addr + (i * TARGET_PAGE_SIZE), idxmap); > + } > +} > + > +void tlb_flush_page_range_by_mmuidx_all_cpus_synced(CPUState *src_cpu, > + target_ulong addr, > + int num_pages, > + uint16_t idxmap) > +{ > + int i; > + > + for (i = 0; i < num_pages; i++) { > + tlb_flush_page_by_mmuidx_all_cpus_synced(src_cpu, > + addr + (i * > TARGET_PAGE_SIZE), > + idxmap); > + } > +}
This is a poor way to structure these functions, because each of these calls is synchronized. You want to do the cross-cpu call once for the entire set of pages, synchronizing once at the end. In addition, tlb_flush_page is insufficient for aarch64, because of TBI. We need a version of tlb_flush_page_bits that takes the length of the flush. This *could* be implemented as a full flush, in the short term. You could round the length outward to a mask, then merge the low-bit mask of the length with the high-bit mask of TBI. That will catch a few more pages than architecturally required, but less than a full flush. Certainly I don't think you ever want to perform this loop 32 (max num) * 16 (max scale) * 64 (max page size) = 32768 times. r~