Re: [RFC v2] non-temporal memcpy

Mattias Rönnblom Wed, 10 Aug 2022 04:59:51 -0700

On 2022-08-09 19:24, Morten Brørup wrote:

From: Stephen Hemminger [mailto:[email protected]]
Sent: Tuesday, 9 August 2022 17.26


On Tue, 9 Aug 2022 11:46:19 +0200
Morten Brørup <[email protected]> wrote:


I don't think memcpy() functions should have alignment

requirements.

That's not very practical, and violates the principle of least
surprise.


I didn't make the CPUs with these alignment requirements.

However, I will offer optimized performance in a generic NT memcpy()

function in the cases where the individual alignment requirements of
various CPUs happen to be met.

Rather than making a generic equivalent memcpy function, why not have
something which only takes aligned data.


Our application is copying data not meeting x86 NT load alignment requirements 
(16 byte), so the function must support that. Specifically, our application is 
copying complete or truncated IP packets excl. the Ethernet and VLAN headers, 
i.e. offset by 14, 18 or 22 byte from the cache line aligned packet buffer.

Sure, but you can use regular loads for the non-aligned parts, and theyou continue to use NT load for the rest of the data. I suspect there isno point in doing NT loads for data on the same cache line that you'vedone regular loads for, so you might as well treat the alignmentrequirements as 64 byte, not 16.

And to avoid user confusion
change the name to be something not suggestive of memcpy.

Maybe rte_non_cache_copy()?

Want to avoid the naive user just doing s/memcpy/rte_memcpy_nt/ and
expect
everything to work.


I see the risk you point out here... But it's not advertised in presentations, 
whitepapers and elsewhere like rte_memcpy() having much better performance than 
classic memcpy(), which might lead to that misconception. So the probability 
should be low.

Re: [RFC v2] non-temporal memcpy

Reply via email to