> Have a look at this series:
> http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg02233.html

Thanks a lot! Looking through it at the moment, looks very promising.

> And I've got another one that I'll send out asap (and I can Cc you).

Thanks in advance :)

> I usually enable a subset of them (one or more "classes") and try to
> figure out if I see the problem in the resulting trace. If yes, I try
> with a narrower subset. If not, I try with either a broader or a
> different one.

Well, I have doubts on how to interpret the very basic info xenalyze is
supporting me with. E.g. how can I measure intra-vm latencies, both global
(how much PCPU time did hypervisor itself spent during all the testing
time) or local (doing the same for specific interrupts)? Why domain 32767
(default domain for cases when it's not clear what domain traces are about
- according to documentation) is getting quite a lot of PCPU time (does
this mean traces are incorrect or there is some significant problem in
setup)? What's concurrency_hazard, partial contention, full_contention, etc
(these are from xenalyze summary)? How can I get number of context switches
(overall or average)?

Adding some subtle questions, like, e.g. I have domain summary looking like
this:

|-- Domain 2 --|
 Runstates:
   blocked:     273  0.35s   7908 {  2093|  9561| 47811}
  partial run:    2284  1.27s   3420 {  6183|  6197|  6382}
  full run:    1322  0.10s    479 {    95|  3772|  6164}
  partial contention:     907  1.73s  11713 { 30655| 34266| 34305}
  concurrency_hazard:    2474  0.18s    435 {    48|  5681|  6206}
  full_contention:     381  0.02s    383 {    56| 36601| 36601}
...
-- v0 --
 Runstates:
   running:    1981  1.36s   4217 {  6193|  6215|  6242}
  runnable:     737  1.74s  14472 {   271| 36780| 38705}
        wake:     430  0.04s    632 {    67| 26049| 35549}
     preempt:     307  1.69s  33856 {   108| 36650| 39345}
   blocked:     430  0.56s   7974 {  1189| 21758| 60893}
 cpu affinity:     336  66914 {  3456| 52202|243760}
   [0]:     167  66156 {  3650| 57926|216477}
   [1]:     169  67663 {  3205| 44754|245733}
-- v1 --
 Runstates:
   running:    2773  0.29s    649 {    54|  6382|  6382}
  runnable:     874  0.22s   1520 {  5995| 36669| 36710}
        wake:     845  0.09s    640 {   452| 25366| 26313}
     preempt:      29  0.13s  27152 { 34413| 36708| 36710}
   blocked:     845  3.14s  22856 {  2477| 61224| 61422}
 cpu affinity:     391  57508 {  2788| 58686|128810}
   [0]:     196  59685 {  2834| 58664|128810}
   [1]:     195  55319 {  2770| 60622|130371}

It looks like Domain 2 had 0.10s of full run and 1.27s of partial run, but
it's VCPU v0 was running 1.36s and VCPU v1 was running 0.29s. How does
these numbers relate, what exactly is partial run, can I get some insight
from concurrency_hazard or full_contention numbers?

I am trying to build up some understanding using xenalyze sources mostly
because documentation does not go into any details whatsoever, but it goes
pretty slow.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to