Il 31/07/2014 10:59, Ming Lei ha scritto: >> > No guesses please. Actually that's also my guess, but since you are >> > submitting the patch you must do better and show profiles where stack >> > switching disappears after the patches. > Follows the below hardware events reported by 'perf stat' when running > fio randread benchmark for 2min in VM(single vq, 2 jobs): > > sudo ~/bin/perf stat -e > L1-dcache-loads,L1-dcache-load-misses,cpu-cycles,instructions,branch-instructions,branch-misses,branch-loads,branch-load-misses,dTLB-loads,dTLB-load-misses > ./nqemu-start-mq 4 1 > > 1), without bypassing coroutine via forcing to set 's->raw_format ' as > false, see patch 5/15 > > - throughout: 95K > 232,564,905,115 instructions > 161.991075781 seconds time elapsed > > > 2), with bypassing coroutinue > - throughput: 115K > 255,526,629,881 instructions > 162.333465490 seconds time elapsed
Ok, so you are saving 10% instructions per iop: before 232G / 95K = 2.45M instructions/iop, 255G / 115K = 2.22M instructions/iop. That's not small, and it's a good thing for CPU utilization even if you were not increasing iops. On top of this, can you provide the stack traces to see the difference in the profiles? Paolo