Re: [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side

Peter Xu Tue, 25 Apr 2017 01:24:53 -0700

On Fri, Apr 14, 2017 at 04:17:18PM +0300, Alexey Perevalov wrote:

[...]


> +/*
> + * This function calculates downtime per cpu and trace it
> + *
> + *  Also it calculates total downtime as an interval's overlap,
> + *  for many vCPU.
> + *
> + *  The approach is following:
> + *  Initially intervals are represented in tree where key is
> + *  pagefault address, and values:
> + *   begin - page fault time
> + *   end   - page load time
> + *   cpus  - bit mask shows affected cpus
> + *
> + *  To calculate overlap on all cpus, intervals converted into
> + *  array of points in time (downtime_points), the size of
> + *  array is 2 * number of nodes in tree of intervals (2 array
> + *  elements per one in element of interval).
> + *  Each element is marked as end (E) or as start (S) of interval.
> + *  The overlap downtime will be calculated for SE, only in case
> + *  there is sequence S(0..N)E(M) for every vCPU.
> + *
> + * As example we have 3 CPU
> + *
> + *      S1        E1           S1               E1
> + * -----***********------------xxx***************------------------------> 
> CPU1
> + *
> + *             S2                E2
> + * ------------****************xxx---------------------------------------> 
> CPU2
> + *
> + *                         S3            E3
> + * ------------------------****xxx********-------------------------------> 
> CPU3
> + *
> + * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
> + * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include 
> CPU3
> + * S3,S1,E2 - sequenece includes all CPUs, in this case overlap will be S1,E2
> + * Legend of picture is following: * - means downtime per vCPU
> + *                                 x - means overlapped downtime
> + */

Not sure whether I get the point in this patch... iiuc we defined the
downtime here as the period when all vcpus are halted, right?

If so, I have a few questions:

- will this algorithm consume lots of memory? since I see we have one
  trace object per fault page address

- do we need to protect the tree to make sure there's no insertion
  when doing the calculation?

- if the only thing we want here is the "total downtime", whether
  below would work? (assuming N is vcpu numbers)

  a. define array cpu_fault_addr[N], to store current faulted address
     for each vcpu. When vcpu X is running, cpu_fault_addr[X] should
     be 0.

  b. when page fault happens on vcpu A, setup cpu_fault_addr[A] with
     corresponding fault address.

  c. when page copy finished, loop over cpu_fault_addr[] to see
     whether that matches any, clear corresponding element if matched.

  Then, we can just measure the period when cpu_fault_addr[] is all
  set (by tracing at both b. and c.). Can this work?

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side

Reply via email to