On Mon, May 26, 2014 at 1:14 PM, erik quanstrom <quans...@quanstro.net> wrote: > so, i've done a little bit more work characterizing the performance > of the scheduler correctness changes, and i know have some understanding > on why e.g. ping times are a bit slower. > > the old code essentially let processor 0 spin in runproc, other processors > called > halt. the new code uses monmwait to wait for a change on all processors. > this has some significant impacts on performance and power use. for example, > on my test box with 4c/8t: > > spin/halt monmwait spin/monmwait > ping 8µs 14µs 8µs # ip/ping -n10 > $sysname > mk 6.26s 3.98s 3.80 # make nix > kernel > fans audible silent audible > δpower - -24w 0 # resolution > = .1A = 12w @ 120v) > > this seems to indicate the latency is all in runproc(), and not waiting for > things > to be ready and assuming they will be has a big performance boost. > > (the third column, testing spin on mach 0, plus monmwait on the others was > done > to tell if monmwait has high latency or not.) > > i'd really be interested to see what this does on 24c/48t machines. something > tells me the performance impacts would be huge, and different. > > - erik > > --- > ps. hzsched in the distribution is 10% off for HZ=100, since > schedticks = m->ticks + HZ/10, and delaysched tests > for > not the expected >=. >
Nice. Excited to see how a cleaned up + simplified runproc() and the per-Mach queues could also change things. Any reason why the ping test w/ monmwait wasn't consistent with the performance improvement in other areas?