Thanks Mark,

Looks like some worthwhile improvments to be had. I've added a couple of
comments inline below.

> +#define PAGE_64K (64UL * 1024) +#define PAGE_2M (2UL * 1024 * 1024) +#define
> PAGE_1G (1UL * 1024 * 1024 * 1024)

include/linux/sizes.h includes definitions for SZ_64K, SZ_2M, SZ_1G, etc. so
unless they're redefined here for some reason I personally think it's cleaner to
use those.

>  /*
> - * Invalidate either a single address or an entire PID depending on
> - * the value of va.
> + * Invalidate a virtual address range
>   */
> -static void mmio_invalidate(struct npu_context *npu_context, int va,
> -                     unsigned long address, bool flush)
> +static void mmio_invalidate(struct npu_context *npu_context,
> +                     unsigned long start, unsigned long size, bool flush)

With this optimisation every caller of mmio_invalidate() sets flush == true so
it no longer appears to be used. We should drop it as a parameter unless you
think there might be some reason to use it in future?

Therefore we could also drop it as a parameter to get_atsd_launch_val(),
mmio_invalidate_pid() and mmio_invalidate_range() as well as I couldn't find any
callers of those that set it to anything other than true.

>       struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS];
>       unsigned long pid = npu_context->mm->context.id;
> +     unsigned long atsd_start = 0;
> +     unsigned long end = start + size - 1;
> +     int atsd_psize = MMU_PAGE_COUNT;
> +
> +     /*
> +      * Convert the input range into one of the supported sizes. If the range
> +      * doesn't fit, use the next larger supported size. Invalidation latency
> +      * is high, so over-invalidation is preferred to issuing multiple
> +      * invalidates.
> +      */
> +     if (size == PAGE_64K) {

We also support 4K page sizes on PPC. If I am not mistaken this means every ATSD
would invalidate the entire GPU TLB for a the given PID on those systems. Could
we change the above check to `if (size <= PAGE_64K)` to avoid this?

> +             atsd_start = start;

Which would also require:

            atsd_start = ALIGN_DOWN(start, PAGE_64K);

> +             atsd_psize = MMU_PAGE_64K;
> +     } else if (ALIGN_DOWN(start, PAGE_2M) == ALIGN_DOWN(end, PAGE_2M)) {

Wouldn't this lead to under invalidation in ranges which happen to cross a 2M
boundary? For example invalidating a 128K (ie. 2x64K pages) range with start ==
0x1f0000 and end == 0x210000 would result in an invalidation of the range 0x0 -
0x200000 incorrectly leaving 0x200000 - 0x210000 in the GPU TLB.

> +             atsd_start = ALIGN_DOWN(start, PAGE_2M);
> +             atsd_psize = MMU_PAGE_2M;
> +     } else if (ALIGN_DOWN(start, PAGE_1G) == ALIGN_DOWN(end, PAGE_1G)) {

Ditto.

> +             atsd_start = ALIGN_DOWN(start, PAGE_1G);
> +             atsd_psize = MMU_PAGE_1G;
> +     }
>  

- Alistair

Reply via email to