On Wed, 2018-10-03 at 18:51:32 UTC, Mark Hairgrove wrote: > There are two types of ATSDs issued to the NPU: invalidates targeting a > specific virtual address and invalidates targeting the whole address > space. In both cases prior to this change, the sequence was: > > for each NPU > - Write the target address to the XTS_ATSD_AVA register > - EIEIO > - Write the launch value to issue the ATSD > > First, a target address is not required when invalidating the whole > address space, so that write and the EIEIO have been removed. The AP > (size) field in the launch is not needed either. > > Second, for per-address invalidates the above sequence is inefficient in > the common case of multiple NPUs because an EIEIO is issued per NPU. This > unnecessarily forces the launches of later ATSDs to be ordered with the > launches of earlier ones. The new sequence only issues a single EIEIO: > > for each NPU > - Write the target address to the XTS_ATSD_AVA register > EIEIO > for each NPU > - Write the launch value to issue the ATSD > > Performance results were gathered using a microbenchmark which creates a > 1G allocation then uses mprotect with PROT_NONE to trigger invalidates in > strides across the allocation. > > With only a single NPU active (one GPU) the difference is in the noise for > both types of invalidates (+/-1%). > > With two NPUs active (on a 6-GPU system) the effect is more noticeable: > > mprotect rate (GB/s) > Stride Before After Speedup > 64K 5.9 6.5 10% > 1M 31.2 33.4 7% > 2M 36.3 38.7 7% > 4M 322.6 356.7 11% > > Signed-off-by: Mark Hairgrove <mhairgr...@nvidia.com> > Reviewed-by: Alistair Popple <alist...@popple.id.au>
Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/7ead15a1442b25e12a6f0791a7c7a5 cheers