RE: [RFC v2] non-temporal memcpy

Honnappa Nagarahalli Thu, 11 Aug 2022 15:25:24 -0700

<snip>

> >
> >>
> >>> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
> >>> Sent: Wednesday, 10 August 2022 13.56
> >>>
> >>> On 2022-08-09 17:26, Stephen Hemminger wrote:
> >>
> >> [...]
> >>
> >>>
> >>> Alignment seems like a non-issue to me. A NT-store memcpy() can be
> >>> made free of alignment requirements, incurring only a very slight
> >>> cost for the always-aligned case (who has their data always 16-byte
> >>> aligned anyways?).
> >>>
> >>> The memory barrier required on x86 seems like a bigger issue.
> >>>
> >>>> Maybe rte_non_cache_copy()?
> >>>>
> >>>
> >>> rte_memcpy_nt_weakly_ordered(), or rte_memcpy_nt_weak(). And a
> >>> rte_memcpy_nt() with the sfence is place, which the user hopefully
> >>> will find first? I don't know. I would prefer not having the weak
> >>> variant at all.
> > I think providing weakly ordered version is required to offset the cost of 
> > the
> barriers. One might be able to copy multiple packets and then issue a barrier.
> >
> 
> On what architecture?
I am talking about Arm architecture. Arm architecture needs barriers between 
normal and NT operations.


> 
> I assumed that only x86 had the peculiar property of having different memory
> models for regular and NT load/stores.
> 
> >>>
> >>> Accepting weak memory ordering (i.e., no sfence) could also be one
> >>> of the flags, assuming rte_memcpy_nt() would have a flags parameter.
> >>> Default is safe (=memcpy() semantics), but potentially slower.
> >>
> >> Excellent idea!
> >>
> >>>
> >>>> Want to avoid the naive user just doing s/memcpy/rte_memcpy_nt/ and
> >>> expect
> >>>> everything to work.
> >

RE: [RFC v2] non-temporal memcpy

Reply via email to