On Mon, Oct 10, 2022 at 11:36:11AM +0200, Morten Brørup wrote: <snip> > > For large copies, which I'm guessing is what non-temporal stores are > > usually used for, this is hair splitting. For DPDK applications, it > > might well be at least somewhat relevant, because such an application > > may make an enormous amount of copies, each roughly the size of a > > packet. > > > > If we had a rte_memcpy_ex() that only cared about copying whole cache > > line in a NT manner, the application could add a clflushopt (or the > > equivalent) after the copy, flushing the the beginning and end cache > > line of the destination buffer. > > That is a good idea. > > Furthermore, POWER and RISC-V don't have NT store, but if they have a cache > line flush instruction, NT destination memcpy could be implemented for those > architectures too - i.e. storing cache line sized blocks and flushing the > cache, and letting the application flush the cache lines at the ends, if > useful for the application.
On RISC-V all stores are from a register (scalar or vector) to a memory location. So is the reasoning behind flushing the cache line to free it up to other data? Other than that there is a ratified RISC-V extension for cache management operations (including flush) - Zicbom. NT load/store hints are being worked on right now. -- Best Regards, Stanislaw Kardach