On Thu, 2017-02-09 at 16:54 -0800, Stefano Stabellini wrote: > These are the results, in nanosec: > > AVG MIN MAX WARM MAX > > NODEBUG no WFI 1890 1800 3170 2070 > NODEBUG WFI 4850 4810 7030 4980 > NODEBUG no WFI credit2 2217 2090 3420 2650 > NODEBUG WFI credit2 8080 7890 10320 8300 > > DEBUG no WFI 2252 2080 3320 2650 > DEBUG WFI 6500 6140 8520 8130 > DEBUG WFI, credit2 8050 7870 10680 8450 > > As you can see, depending on whether the guest issues a WFI or not > while > waiting for interrupts, the results change significantly. > Interestingly, > credit2 does worse than credit1 in this area. > I did some measuring myself, on x86, with different tools. So, cyclictest is basically something very very similar to the app Stefano's app.
I've run it both within Dom0, and inside a guest. I also run a Xen build (in this case, only inside of the guest). > We are down to 2000-3000ns. Then, I started investigating the > scheduler. > I measured how long it takes to run "vcpu_unblock": 1050ns, which is > significant. I don't know what is causing the remaining 1000-2000ns, > but > I bet on another scheduler function. Do you have any suggestions on > which one? > So, vcpu_unblock() calls vcpu_wake(), which then invokes the scheduler's wakeup related functions. If you time vcpu_unblock(), from beginning to end of the function, you actually capture quite a few things. E.g., the scheduler lock is taken inside vcpu_wake(), so you're basically including time spent waited on the lock in the estimation. That is probably ok (as in, lock contention definitely is something relevant to latency), but it is expected for things to be rather different between Credit1 and Credit2. I've, OTOH, tried to time, SCHED_OP(wake) and SCHED_OP(do_schedule), and here's the result. Numbers are in cycles (I've used RDTSC) and, for making sure to obtain consistent and comparable numbers, I've set the frequency scaling governor to performance. Dom0, [performance] cyclictest 1us cyclictest 1ms cyclictest 100ms (cycles) Credit1 Credit2 Credit1 Credit2 Credit1 Credit2 wakeup-avg 2429 2035 1980 1633 2535 1979 wakeup-max 14577 113682 15153 203136 12285 115164 sched-avg 1716 1860 2527 1651 2286 1670 sched-max 16059 15000 12297 101760 15831 13122 VM, [performance] cyclictest 1us cyclictest 1ms cyclictest 100ms make -j xen (cycles) Credit1 Credit2 Credit1 Credit2 Credit1 Credit2 Credit1 Credit2 wakeup-avg 2213 2128 1944 2342 2374 2213 2429 1618 wakeup-max 9990 10104 11262 9927 10290 10218 14430 15108 sched-avg 2437 2472 1620 1594 2498 1759 2449 1809 sched-max 14100 14634 10071 9984 10878 8748 16476 14220 Actually, TSC on this box should be stable and invariant, so I guess I can try with the default governor. Will do that on Monday. Does ARM have frequency scaling (I did remember something on xen-devel, but I am not sure whether it landed upstream)? But anyway. You're seeing big differences between Credit1 and Credit2, while I, at least as far as the actual schedulers' code is concerned, don't. Credit2 shows higher wakeup-max values, but only in cases where the workload runs in dom0. But it also shows better (lower) averages, in both the two kind of workload considered and both in the dom0 and VM case. I therefore wonder what is actually responsible for the huge differences between the two scheduler that you are seeing... could be lock contention, but with only 4 pCPUs and 2 active vCPUs, I honestly doubt it... Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel