Rusty Russell <ru...@rustcorp.com.au> writes: > Anthony Liguori <anth...@codemonkey.ws> writes: >> I suspect this is a premature optimization. With a weak function called >> directly in the accessors below, I suspect you would see no measurable >> performance overhead compared to this approach. >> >> It's all very predictable so the CPU should do a decent job optimizing >> the if () away. > > Perhaps. I was leery of introducing performance regressions, but the > actual I/O tends to dominate anyway. > > So I tested this, by adding the patch (below) and benchmarking > qemu-system-i386 on my laptop before and after. > > Setup: Intel(R) Core(TM) i5 CPU M 560 @ 2.67GHz > (Performance cpu governer enabled) > Guest: virtio user net, virtio block on raw file, 1 CPU, 512MB RAM. > (Qemu run under eatmydata to eliminate syncs)
FYI, cache=unsafe is equivalent to using eatmydata. > First test: ping -f -c 10000 -q 10.0.2.0 (100 times) > (Ping chosen since packets stay in qemu's user net code) > > BEFORE: > MIN: 824ms > MAX: 914ms > AVG: 876.95ms > STDDEV: 16ms > > AFTER: > MIN: 872ms > MAX: 933ms > AVG: 904.35ms > STDDEV: 15ms I can reproduce this although I also see a larger standard deviation. BEFORE: MIN: 496 MAX: 1055 AVG: 873.22 STDEV: 136.88 AFTER: MIN: 494 MAX: 1456 AVG: 947.77 STDEV: 150.89 In my datasets, the stdev is higher in the after case implying that there is more variation. Indeed, the MIN is pretty much the same. GCC is inlining the functions, I'm still surprised that it's measurable at all. At any rate, I think the advantage of not increasing the amount of target specific code outweighs the performance difference here. As you said, if there is real I/O, the differences isn't noticable. Regards, Anthony Liguori