Thanks Mark, Looks like some worthwhile improvments to be had. I've added a couple of comments inline below.
> +#define PAGE_64K (64UL * 1024) +#define PAGE_2M (2UL * 1024 * 1024) +#define > PAGE_1G (1UL * 1024 * 1024 * 1024) include/linux/sizes.h includes definitions for SZ_64K, SZ_2M, SZ_1G, etc. so unless they're redefined here for some reason I personally think it's cleaner to use those. > /* > - * Invalidate either a single address or an entire PID depending on > - * the value of va. > + * Invalidate a virtual address range > */ > -static void mmio_invalidate(struct npu_context *npu_context, int va, > - unsigned long address, bool flush) > +static void mmio_invalidate(struct npu_context *npu_context, > + unsigned long start, unsigned long size, bool flush) With this optimisation every caller of mmio_invalidate() sets flush == true so it no longer appears to be used. We should drop it as a parameter unless you think there might be some reason to use it in future? Therefore we could also drop it as a parameter to get_atsd_launch_val(), mmio_invalidate_pid() and mmio_invalidate_range() as well as I couldn't find any callers of those that set it to anything other than true. > struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS]; > unsigned long pid = npu_context->mm->context.id; > + unsigned long atsd_start = 0; > + unsigned long end = start + size - 1; > + int atsd_psize = MMU_PAGE_COUNT; > + > + /* > + * Convert the input range into one of the supported sizes. If the range > + * doesn't fit, use the next larger supported size. Invalidation latency > + * is high, so over-invalidation is preferred to issuing multiple > + * invalidates. > + */ > + if (size == PAGE_64K) { We also support 4K page sizes on PPC. If I am not mistaken this means every ATSD would invalidate the entire GPU TLB for a the given PID on those systems. Could we change the above check to `if (size <= PAGE_64K)` to avoid this? > + atsd_start = start; Which would also require: atsd_start = ALIGN_DOWN(start, PAGE_64K); > + atsd_psize = MMU_PAGE_64K; > + } else if (ALIGN_DOWN(start, PAGE_2M) == ALIGN_DOWN(end, PAGE_2M)) { Wouldn't this lead to under invalidation in ranges which happen to cross a 2M boundary? For example invalidating a 128K (ie. 2x64K pages) range with start == 0x1f0000 and end == 0x210000 would result in an invalidation of the range 0x0 - 0x200000 incorrectly leaving 0x200000 - 0x210000 in the GPU TLB. > + atsd_start = ALIGN_DOWN(start, PAGE_2M); > + atsd_psize = MMU_PAGE_2M; > + } else if (ALIGN_DOWN(start, PAGE_1G) == ALIGN_DOWN(end, PAGE_1G)) { Ditto. > + atsd_start = ALIGN_DOWN(start, PAGE_1G); > + atsd_psize = MMU_PAGE_1G; > + } > - Alistair