On Thu, Jul 06, 2023 at 03:23:43PM -0400, Peter Xu wrote:
> On Thu, Jun 29, 2023 at 11:59:03AM +0300, Andrei Gudkov wrote:
> > Introduces alternative argument calc-time-ms, which is the
> > the same as calc-time but accepts millisecond value.
> > Millisecond precision allows to make predictions whether
> > migration will succeed or not. To do this, calculate dirty
> > rate with calc-time-ms set to max allowed downtime, convert
> > measured rate into volume of dirtied memory, and divide by
> > network throughput. If the value is lower than max allowed
> > downtime, then migration will converge.
> > 
> > Measurement results for single thread randomly writing to
> > a 24GiB region:
> > +--------------+--------------------+
> > | calc-time-ms | dirty-rate (MiB/s) |
> > +--------------+--------------------+
> > |          100 |               1880 |
> > |          200 |               1340 |
> > |          300 |               1120 |
> > |          400 |               1030 |
> > |          500 |                868 |
> > |          750 |                720 |
> > |         1000 |                636 |
> > |         1500 |                498 |
> > |         2000 |                423 |
> > +--------------+--------------------+
> 
> Do you mean the dirty workload is constant?  Why it differs so much with
> different calc-time-ms?

Workload is as constant as it could be. But the naming is misleading.
What is named "dirty-rate" in fact is not "rate" at all.
calc-dirty-rate measures number of *uniquely* dirtied pages, i.e. each
page can contribute to the counter only once during measurement period.
That's why the values are decreasing. Consider also ad infinitum argument:
since VM has fixed number of pages and each page can be dirtied only once,
dirty-rate=number-of-dirtied-pages/calc-time -> 0 as calc-time -> inf.
It would make more sense to report number as "dirty-volume" --
without dividing it by calc-time.

Note that number of *uniquely* dirtied pages in given amount of time is
exactly what we need for doing migration-related predictions. There is
no error here.

> 
> I also doubt usefulness on huge vms the ms accuracy won't matter much
> because enable/disable dirty logging overhead can already be ms level for
> those.
> 
The goal is to measure volume of dirtied memory corresponding to desired
downtime, which is typically order of 100-300ms. I think that error of
+/-10ms won't make any big difference. Maybe I should've called this 
"millisecond granularity" and not "millisecond precision".

> > 
> > Signed-off-by: Andrei Gudkov <gudkov.and...@huawei.com>
> > ---
> >  qapi/migration.json   | 15 ++++++--
> >  migration/dirtyrate.h | 12 ++++---
> >  migration/dirtyrate.c | 81 +++++++++++++++++++++++++------------------
> >  3 files changed, 68 insertions(+), 40 deletions(-)
> > 
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 5bb5ab82a0..dd1afe1982 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -1778,7 +1778,12 @@
> >  #
> >  # @start-time: start time in units of second for calculation
> >  #
> > -# @calc-time: time in units of second for sample dirty pages
> > +# @calc-time: time period for which dirty page rate was measured
> > +#     (rounded down to seconds).
> > +#
> > +# @calc-time-ms: actual time period for which dirty page rate was
> > +#     measured (in milliseconds).  Value may be larger than requested
> > +#     time period due to measurement overhead.
> >  #
> >  # @sample-pages: page count per GB for sample dirty pages the default
> >  #     value is 512 (since 6.1)
> > @@ -1796,6 +1801,7 @@
> >             'status': 'DirtyRateStatus',
> >             'start-time': 'int64',
> >             'calc-time': 'int64',
> > +           'calc-time-ms': 'int64',
> >             'sample-pages': 'uint64',
> >             'mode': 'DirtyRateMeasureMode',
> >             '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
> > @@ -1807,6 +1813,10 @@
> >  #
> >  # @calc-time: time in units of second for sample dirty pages
> >  #
> > +# @calc-time-ms: the same as @calc-time but in milliseconds.  These
> > +#    two arguments are mutually exclusive.  Exactly one of them must
> > +#    be specified. (Since 8.1)
> > +#
> >  # @sample-pages: page count per GB for sample dirty pages the default
> >  #     value is 512 (since 6.1)
> >  #
> > @@ -1821,7 +1831,8 @@
> >  #                                                 'sample-pages': 512} }
> >  # <- { "return": {} }
> >  ##
> > -{ 'command': 'calc-dirty-rate', 'data': {'calc-time': 'int64',
> > +{ 'command': 'calc-dirty-rate', 'data': {'*calc-time': 'int64',
> > +                                         '*calc-time-ms': 'int64',
> >                                           '*sample-pages': 'int',
> >                                           '*mode': 'DirtyRateMeasureMode'} }
> >  
> > diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
> > index 594a5c0bb6..869c060941 100644
> > --- a/migration/dirtyrate.h
> > +++ b/migration/dirtyrate.h
> > @@ -31,10 +31,12 @@
> >  #define MIN_RAMBLOCK_SIZE                         128
> >  
> >  /*
> > - * Take 1s as minimum time for calculation duration
> > + * Allowed range for dirty page rate calculation (in milliseconds).
> > + * Lower limit relates to the smallest realistic downtime it
> > + * makes sense to impose on migration.
> >   */
> > -#define MIN_FETCH_DIRTYRATE_TIME_SEC              1
> > -#define MAX_FETCH_DIRTYRATE_TIME_SEC              60
> > +#define MIN_CALC_TIME_MS                          50
> > +#define MAX_CALC_TIME_MS                       60000
> >  
> >  /*
> >   * Take 1/16 pages in 1G as the maxmum sample page count
> > @@ -44,7 +46,7 @@
> >  
> >  struct DirtyRateConfig {
> >      uint64_t sample_pages_per_gigabytes; /* sample pages per GB */
> > -    int64_t sample_period_seconds; /* time duration between two sampling */
> > +    int64_t calc_time_ms; /* desired calculation time (in milliseconds) */
> >      DirtyRateMeasureMode mode; /* mode of dirtyrate measurement */
> >  };
> >  
> > @@ -73,7 +75,7 @@ typedef struct SampleVMStat {
> >  struct DirtyRateStat {
> >      int64_t dirty_rate; /* dirty rate in MB/s */
> >      int64_t start_time; /* calculation start time in units of second */
> > -    int64_t calc_time; /* time duration of two sampling in units of second 
> > */
> > +    int64_t calc_time_ms; /* actual calculation time (in milliseconds) */
> >      uint64_t sample_pages; /* sample pages per GB */
> >      union {
> >          SampleVMStat page_sampling;
> > diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
> > index 84f1b0fb20..90fb336329 100644
> > --- a/migration/dirtyrate.c
> > +++ b/migration/dirtyrate.c
> > @@ -57,6 +57,8 @@ static int64_t dirty_stat_wait(int64_t msec, int64_t 
> > initial_time)
> >          msec = current_time - initial_time;
> >      } else {
> >          g_usleep((msec + initial_time - current_time) * 1000);
> > +        /* g_usleep may overshoot */
> > +        msec = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - initial_time;
> 
> This removes the g_usleep(), is it intended?  I thought it was the code to
> do the delay.  What is the solution to g_usleep() then?
> 
It is still there -- just above the comment. What I added is
qemu_clock_get_ms() call after the g_usleep. It sets msec variable
to the true amount of time elapsed between measurements. Previously
I detected that actually slept time may be slightly longer than
requested value.

> >      }
> >  
> >      return msec;
> > @@ -77,9 +79,12 @@ static int64_t do_calculate_dirtyrate(DirtyPageRecord 
> > dirty_pages,
> >  {
> >      uint64_t increased_dirty_pages =
> >          dirty_pages.end_pages - dirty_pages.start_pages;
> > -    uint64_t memory_size_MiB = 
> > qemu_target_pages_to_MiB(increased_dirty_pages);
> > -
> > -    return memory_size_MiB * 1000 / calc_time_ms;
> > +    /*
> > +     * multiply by 1000ms/s _before_ converting down to megabytes
> > +     * to avoid losing precision
> > +     */
> > +    return qemu_target_pages_to_MiB(increased_dirty_pages * 1000) /
> > +        calc_time_ms;
> >  }
> >  
> >  void global_dirty_log_change(unsigned int flag, bool start)
> > @@ -183,10 +188,9 @@ retry:
> >      return duration;
> >  }
> >  
> > -static bool is_sample_period_valid(int64_t sec)
> > +static bool is_calc_time_valid(int64_t msec)
> >  {
> > -    if (sec < MIN_FETCH_DIRTYRATE_TIME_SEC ||
> > -        sec > MAX_FETCH_DIRTYRATE_TIME_SEC) {
> > +    if ((msec < MIN_CALC_TIME_MS) || (msec > MAX_CALC_TIME_MS)) {
> >          return false;
> >      }
> >  
> > @@ -219,7 +223,8 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
> >  
> >      info->status = CalculatingState;
> >      info->start_time = DirtyStat.start_time;
> > -    info->calc_time = DirtyStat.calc_time;
> > +    info->calc_time_ms = DirtyStat.calc_time_ms;
> > +    info->calc_time = DirtyStat.calc_time_ms / 1000;
> >      info->sample_pages = DirtyStat.sample_pages;
> >      info->mode = dirtyrate_mode;
> >  
> > @@ -258,7 +263,7 @@ static void init_dirtyrate_stat(int64_t start_time,
> >  {
> >      DirtyStat.dirty_rate = -1;
> >      DirtyStat.start_time = start_time;
> > -    DirtyStat.calc_time = config.sample_period_seconds;
> > +    DirtyStat.calc_time_ms = config.calc_time_ms;
> >      DirtyStat.sample_pages = config.sample_pages_per_gigabytes;
> >  
> >      switch (config.mode) {
> > @@ -568,7 +573,6 @@ static inline void dirtyrate_manual_reset_protect(void)
> >  
> >  static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig config)
> >  {
> > -    int64_t msec = 0;
> >      int64_t start_time;
> >      DirtyPageRecord dirty_pages;
> >  
> > @@ -596,9 +600,7 @@ static void calculate_dirtyrate_dirty_bitmap(struct 
> > DirtyRateConfig config)
> >      start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >      DirtyStat.start_time = start_time / 1000;
> >  
> > -    msec = config.sample_period_seconds * 1000;
> > -    msec = dirty_stat_wait(msec, start_time);
> > -    DirtyStat.calc_time = msec / 1000;
> > +    DirtyStat.calc_time_ms = dirty_stat_wait(config.calc_time_ms, 
> > start_time);
> >  
> >      /*
> >       * do two things.
> > @@ -609,12 +611,12 @@ static void calculate_dirtyrate_dirty_bitmap(struct 
> > DirtyRateConfig config)
> >  
> >      record_dirtypages_bitmap(&dirty_pages, false);
> >  
> > -    DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages, msec);
> > +    DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages,
> > +                                                  DirtyStat.calc_time_ms);
> >  }
> >  
> >  static void calculate_dirtyrate_dirty_ring(struct DirtyRateConfig config)
> >  {
> > -    int64_t duration;
> >      uint64_t dirtyrate = 0;
> >      uint64_t dirtyrate_sum = 0;
> >      int i = 0;
> > @@ -625,12 +627,10 @@ static void calculate_dirtyrate_dirty_ring(struct 
> > DirtyRateConfig config)
> >      DirtyStat.start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000;
> >  
> >      /* calculate vcpu dirtyrate */
> > -    duration = vcpu_calculate_dirtyrate(config.sample_period_seconds * 
> > 1000,
> > -                                        &DirtyStat.dirty_ring,
> > -                                        GLOBAL_DIRTY_DIRTY_RATE,
> > -                                        true);
> > -
> > -    DirtyStat.calc_time = duration / 1000;
> > +    DirtyStat.calc_time_ms = vcpu_calculate_dirtyrate(config.calc_time_ms,
> > +                                                      
> > &DirtyStat.dirty_ring,
> > +                                                      
> > GLOBAL_DIRTY_DIRTY_RATE,
> > +                                                      true);
> >  
> >      /* calculate vm dirtyrate */
> >      for (i = 0; i < DirtyStat.dirty_ring.nvcpu; i++) {
> > @@ -646,7 +646,6 @@ static void calculate_dirtyrate_sample_vm(struct 
> > DirtyRateConfig config)
> >  {
> >      struct RamblockDirtyInfo *block_dinfo = NULL;
> >      int block_count = 0;
> > -    int64_t msec = 0;
> >      int64_t initial_time;
> >  
> >      rcu_read_lock();
> > @@ -656,17 +655,16 @@ static void calculate_dirtyrate_sample_vm(struct 
> > DirtyRateConfig config)
> >      }
> >      rcu_read_unlock();
> >  
> > -    msec = config.sample_period_seconds * 1000;
> > -    msec = dirty_stat_wait(msec, initial_time);
> > +    DirtyStat.calc_time_ms = dirty_stat_wait(config.calc_time_ms,
> > +                                             initial_time);
> >      DirtyStat.start_time = initial_time / 1000;
> > -    DirtyStat.calc_time = msec / 1000;
> >  
> >      rcu_read_lock();
> >      if (!compare_page_hash_info(block_dinfo, block_count)) {
> >          goto out;
> >      }
> >  
> > -    update_dirtyrate(msec);
> > +    update_dirtyrate(DirtyStat.calc_time_ms);
> >  
> >  out:
> >      rcu_read_unlock();
> > @@ -711,7 +709,10 @@ void *get_dirtyrate_thread(void *arg)
> >      return NULL;
> >  }
> >  
> > -void qmp_calc_dirty_rate(int64_t calc_time,
> > +void qmp_calc_dirty_rate(bool has_calc_time,
> > +                         int64_t calc_time,
> > +                         bool has_calc_time_ms,
> > +                         int64_t calc_time_ms,
> >                           bool has_sample_pages,
> >                           int64_t sample_pages,
> >                           bool has_mode,
> > @@ -731,10 +732,21 @@ void qmp_calc_dirty_rate(int64_t calc_time,
> >          return;
> >      }
> >  
> > -    if (!is_sample_period_valid(calc_time)) {
> > -        error_setg(errp, "calc-time is out of range[%d, %d].",
> > -                         MIN_FETCH_DIRTYRATE_TIME_SEC,
> > -                         MAX_FETCH_DIRTYRATE_TIME_SEC);
> > +    if ((int)has_calc_time + (int)has_calc_time_ms != 1) {
> > +        error_setg(errp, "Exactly one of calc-time and calc-time-ms must"
> > +                         " be specified");
> > +        return;
> > +    }
> > +    if (has_calc_time) {
> > +        /*
> > +         * The worst thing that can happen due to overflow is that
> > +         * invalid value will become valid.
> > +         */
> > +        calc_time_ms = calc_time * 1000;
> > +    }
> > +    if (!is_calc_time_valid(calc_time_ms)) {
> > +        error_setg(errp, "Calculation time is out of range[%dms, %dms].",
> > +                         MIN_CALC_TIME_MS, MAX_CALC_TIME_MS);
> >          return;
> >      }
> >  
> > @@ -781,7 +793,7 @@ void qmp_calc_dirty_rate(int64_t calc_time,
> >          return;
> >      }
> >  
> > -    config.sample_period_seconds = calc_time;
> > +    config.calc_time_ms = calc_time_ms;
> >      config.sample_pages_per_gigabytes = sample_pages;
> >      config.mode = mode;
> >  
> > @@ -867,8 +879,11 @@ void hmp_calc_dirty_rate(Monitor *mon, const QDict 
> > *qdict)
> >          mode = DIRTY_RATE_MEASURE_MODE_DIRTY_RING;
> >      }
> >  
> > -    qmp_calc_dirty_rate(sec, has_sample_pages, sample_pages, true,
> > -                        mode, &err);
> > +    qmp_calc_dirty_rate(true, sec, /* calc_time */
> > +                        false, 0, /* calc_time_ms */
> > +                        has_sample_pages, sample_pages,
> > +                        true, mode,
> > +                        &err);
> >      if (err) {
> >          hmp_handle_error(mon, err);
> >          return;
> > -- 
> > 2.30.2
> > 
> 
> -- 
> Peter Xu

Reply via email to