On Thu, 12 Jan 2017 10:30:58 +0800 Yuanhan Liu <yuanhan....@linux.intel.com> wrote:
> On Wed, Jan 11, 2017 at 03:51:22PM +0100, Thomas Monjalon wrote: > > 2017-01-11 12:27, Yuanhan Liu: > > > The fact that virtio net header is initiated to zero in PMD driver > > > init stage means that these costly writes are unnecessary and could > > > be avoided: > > > > > > if (hdr->csum_start != 0) > > > hdr->csum_start = 0; > > > > > > And that's what the macro ASSIGN_UNLESS_EQUAL does. With this, the > > > performance drop introduced by TSO enabling is recovered: it could > > > be up to 20% in micro benchmarking. > > > > This patch is adding a condition to assignments. > > We need a benchmark on other architectures like ARM. Please anyone? > > I think the cost of condition should be way lower than the cost from the > penalty introduced by the cache issue, that I don't see it would perform > bad on other platforms. > > But, of course, testing is always welcome! > > --yliu Hello, we've done a synthetic measurement, principle briefly: == Without condition check == start = gettimeofday(); for (i = 0; i < 1024*1024*128; ++i) { hdr->csum_start = 0; hdr->csum_offset = 0; hdr->flags = 0; } end = gettimeofday(); == With condition check == start = gettimeofday(); for (i = 0; i < 1024*1024*128; ++i) { ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0); ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0); ASSIGN_UNLESS_EQUAL(hdr->flags, 0); } end = gettimeofday(); == Results == Computed as total time of all threads: for i = 1..THREAD_COUNT: result += end[i] - start[i] cpu threads without-check (ms) with-check Xeon E5-2670 1 516 529 Xeon E5-2670 2 1155 953 Xeon E5-2670 8 8947 5044 Xeon E5-2670 16 23335 16836 Zynq-7020 (armv7) 1 6735 7205 Zynq-7020 (armv7) 2 13753 14418 The advantage for Intel is evident when increasing the number of threads. However, on 32-bit ARMs we might expect some performance drop. Regards Jan > > > > > > [...] > > > +/* avoid write operation when necessary, to lessen cache issues */ > > > +#define ASSIGN_UNLESS_EQUAL(var, val) do { \ > > > + if ((var) != (val)) \ > > > + (var) = (val); \ > > > +} while (0)