<snip> > > > >> > >>> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > >>> Sent: Wednesday, 10 August 2022 13.56 > >>> > >>> On 2022-08-09 17:26, Stephen Hemminger wrote: > >> > >> [...] > >> > >>> > >>> Alignment seems like a non-issue to me. A NT-store memcpy() can be > >>> made free of alignment requirements, incurring only a very slight > >>> cost for the always-aligned case (who has their data always 16-byte > >>> aligned anyways?). > >>> > >>> The memory barrier required on x86 seems like a bigger issue. > >>> > >>>> Maybe rte_non_cache_copy()? > >>>> > >>> > >>> rte_memcpy_nt_weakly_ordered(), or rte_memcpy_nt_weak(). And a > >>> rte_memcpy_nt() with the sfence is place, which the user hopefully > >>> will find first? I don't know. I would prefer not having the weak > >>> variant at all. > > I think providing weakly ordered version is required to offset the cost of > > the > barriers. One might be able to copy multiple packets and then issue a barrier. > > > > On what architecture? I am talking about Arm architecture. Arm architecture needs barriers between normal and NT operations.
> > I assumed that only x86 had the peculiar property of having different memory > models for regular and NT load/stores. > > >>> > >>> Accepting weak memory ordering (i.e., no sfence) could also be one > >>> of the flags, assuming rte_memcpy_nt() would have a flags parameter. > >>> Default is safe (=memcpy() semantics), but potentially slower. > >> > >> Excellent idea! > >> > >>> > >>>> Want to avoid the naive user just doing s/memcpy/rte_memcpy_nt/ and > >>> expect > >>>> everything to work. > >