[dpdk-dev] [snabb-devel] RE: [PATCH 0/4] DPDK memcpy optimization

Luke Gorrie Tue, 27 Jan 2015 14:57:44 +0100

Hi again John,

Thank you for the patient answers :-)


Thank you for pointing this out: I was mistakenly testing your Sandy Bridge
code on Haswell (lacking -DRTE_MACHINE_CPUFLAG_AVX2).

Correcting that, your code is both the fastest and the smallest in my
humble micro benchmarking tests.

Looks like you have done great work! You probably knew that already :-) but
thank you for walking me through it.

The code compiles to 745 bytes of object code (smaller than glibc 2.20
memcpy) and cachebenches like this:

                Memory Copy Library Cache Test

C Size          Nanosec         MB/sec          % Chnge
-------         -------         -------         -------
256             0.01            97587.60        1.00
384             0.01            97628.83        1.00
512             0.01            97613.95        1.00
768             0.01            147811.44       0.66
1024            0.01            158938.68       0.93
1536            0.01            168487.49       0.94
2048            0.01            174278.83       0.97
3072            0.01            156922.58       1.11
4096            0.01            145811.59       1.08
6144            0.01            157388.27       0.93
8192            0.01            149616.95       1.05
12288           0.01            149064.26       1.00
16384           0.01            107895.06       1.38

the key difference from my perspective is that glibc 2.20 memcpy
performance goes way down for >= 2048 bytes when they switch from vector
moves to string moves, while your code stays consistent.

I will take it for a spin in a real application.

Cheers,
-Luke

[dpdk-dev] [snabb-devel] RE: [PATCH 0/4] DPDK memcpy optimization

Reply via email to