On Fri, 29 Jul 2022 12:13:52 +0000 Konstantin Ananyev <konstantin.anan...@huawei.com> wrote:
> Sorry, missed that part. > > > > > > Another question - who will do 'sfence' after the copying? > > > Would it be inside memcpy_nt (seems quite costly), or would > > > it be another API function for that: memcpy_nt_flush() or so? > > > > Outside. Only the developer knows when it is required, so it wouldn't make > > any sense to add the cost inside memcpy_nt(). > > > > I don't think we should add a flush function; it would just be another name > > for an already existing function. Referring to the required > > operation in the memcpy_nt() function documentation should suffice. > > > > Ok, but again wouldn't it be arch specific? > AFAIK for x86 it needs to boil down to sfence, for other architectures - I > don't know. > If you think there already is some generic one (rte_wmb?) that would always > produce > correct instructions - sure let's use it. > > It makes sense in a few select places to use non-temporal copy. But it would add unnecessary complexity to DPDK if every function in DPDK that could cause a copy had a non-temporal variant. Maybe just having rte_memcpy have a threshold (config value?) that if copy is larger than a certain size, then it would automatically be non-temporal. Small copies wouldn't matter, the optimization is more about not stopping cache size issues with large streams of data.