Peter Xu <pet...@redhat.com> writes: > Blocktime so far only cares about the time one vcpu (or the whole system) > got blocked. It would be also be helpful if it can also report the latency > of page requests, which could be very sensitive during postcopy. > > Blocktime itself is sometimes not very important, especially when one > thinks about KVM async PF support, which means vCPUs are literally almost > not blocked at all because the guest OS is smart enough to switch to > another task when a remote fault is needed. > > However, latency is still sensitive and important because even if the guest > vCPU is running on threads that do not need a remote fault, the workload > that accesses some missing page is still affected. > > Add two entries to the report, showing how long it takes to resolve a > remote fault. Mention in the QAPI doc that this is not the real average > fault latency, but only the ones that was requested for a remote fault. > > Unwrap get_vcpu_blocktime_list() so we don't need to walk the list twice, > meanwhile add the entry checks in qtests for all postcopy tests. > > Cc: Markus Armbruster <arm...@redhat.com> > Cc: Dr. David Alan Gilbert <d...@treblig.org> > Signed-off-by: Peter Xu <pet...@redhat.com> > --- > qapi/migration.json | 13 +++++ > migration/migration-hmp-cmds.c | 70 ++++++++++++++++++--------- > migration/postcopy-ram.c | 48 ++++++++++++------ > tests/qtest/migration/migration-qmp.c | 3 ++ > 4 files changed, 97 insertions(+), 37 deletions(-) > > diff --git a/qapi/migration.json b/qapi/migration.json > index 8b9c53595c..8b13cea169 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -236,6 +236,17 @@ > # This is only present when the postcopy-blocktime migration > # capability is enabled. (Since 3.0) > # > +# @postcopy-latency: average remote page fault latency (in us). Note that > +# this doesn't include all faults, but only the ones that require a > +# remote page request. So it should be always bigger than the real > +# average page fault latency. This is only present when the > +# postcopy-blocktime migration capability is enabled. (Since 10.1) > +# > +# @postcopy-vcpu-latency: average remote page fault latency per vCPU (in > +# us). It has the same definition of @postcopy-latency, but instead > +# this is the per-vCPU statistics. This is only present when the
Two spaces between sentences for consistency, please. > +# postcopy-blocktime migration capability is enabled. (Since 10.1) I figure the the @i-th array element is for vCPU with index @i. Correct? This is also only present when @postcopy-blocktime is enabled. Correct? Could a QMP client compute @postcopy-latency from @postcopy-vcpu-latency? > +# > # @socket-address: Only used for tcp, to know what the real port is > # (Since 4.0) > # > @@ -275,6 +286,8 @@ > '*blocked-reasons': ['str'], > '*postcopy-blocktime': 'uint32', > '*postcopy-vcpu-blocktime': ['uint32'], > + '*postcopy-latency': 'uint64', > + '*postcopy-vcpu-latency': ['uint64'], > '*socket-address': ['SocketAddress'], > '*dirty-limit-throttle-time-per-round': 'uint64', > '*dirty-limit-ring-full-time': 'uint64'} } [...]