On Thu, 3 Aug 2023, Matthew Malcomson wrote:

> > 
> > I think this is undesriable.  With fused you mean we use FMA?
> > I think you could use -ffp-contract=off for the TU instead.
> > 
> > Note you can't use __attribute__((noinline)) literally since the
> > host compiler might not support this.
> > 
> > Richard.
> > 
> 
> 
> Trying to make the timevar store integral time intervals.
> Hope this is acceptable -- I had originally planned to use
> `-ffp-contract` as agreed until I saw the email mentioning the old x86
> bug in the same area which was not to do with floating point contraction
> of operations (PR 99903) and figured it would be better to try and solve
> both at the same time while making things in general a bit more robust.
> ________________________________________________________________________
> 
> 
> On some AArch64 bootstrapped builds, we were getting a flaky test
> because the floating point operations in `get_time` were being fused
> with the floating point operations in `timevar_accumulate`.
> 
> This meant that the rounding behaviour of our multiplication with
> `ticks_to_msec` was different when used in `timer::start` and when
> performed in `timer::stop`.  These extra inaccuracies led to the
> testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.
> 
> ------------------------------
> Avoiding the inlining which was agreed to be undesirable.  Three
> alternative approaches:
> 1) Use `-ffp-contract=on` to avoid this particular optimisation.
> 2) Adjusting the code so that the "tolerance" is always of the order of
>    a "tick".
> 3) Recording times and elapsed differences in integral values.
>    - Could be in terms of a standard measurement (e.g. nanoseconds or
>      microseconds).
>    - Could be in terms of whatever integral value ("ticks" /
>      seconds&microseconds / "clock ticks") is returned from the syscall
>      chosen at configure time.
> 
> While `-ffp-contract=on` removes the problem that I bumped into, there
> has been a similar bug on x86 that was to do with a different floating
> point problem that also happens after `get_time` and
> `timevar_accumulate` both being inlined into the same function.  Hence
> it seems worth choosing a different approach.
> 
> Of the two other solutions, recording measurements in integral values
> seems the most robust against slightly "off" measurements being
> presented to the user -- even though it could avoid the ICE that creates
> a flaky test.
> 
> I considered storing time in whatever units our syscall returns and
> normalising them at the time we print out rather than normalising them
> to nanoseconds at the point we record our "current time".  The logic
> being that normalisation could have some rounding affect (e.g. if
> TICKS_PER_SECOND is 3) that would be taken into account in calculations.
> 
> I decided against it in order to give the values recorded in
> `timevar_time_def` some interpretive value so it's easier to read the
> code.  Compared to the small rounding that would represent a tiny amount
> of time and AIUI can not trigger the same kind of ICE's as we are
> attempting to fix, said interpretive value seems more valuable.
> 
> Recording time in microseconds seemed reasonable since all obvious
> values for ticks and `getrusage` are at microsecond granularity or less
> precise.  That said, since TICKS_PER_SECOND and CLOCKS_PER_SEC are both
> variables given to use by the host system I was not sure of that enough
> to make this decision.
> 
> ------------------------------
> timer::all_zero is ignoring rows which are inconsequential to the user
> and would be printed out as all zeros.  Since upon printing rows we
> convert to the same double value and print out the same precision as
> before, we return true/false based on the same amount of time as before.
> 
> timer::print_row casts to a floating point measurement in units of
> seconds as was printed out before.
> 
> timer::validate_phases -- I'm printing out nanoseconds here rather than
> floating point seconds since this is an error message for when things
> have "gone wrong" printing out the actual nanoseconds that have been
> recorded seems like the best approach.
> N.b. since we now print out nanoseconds instead of floating point value
> the padding requirements are different.  Originally we were padding to
> 24 characters and printing 18 decimal places.  This looked odd with the
> now visually smaller values getting printed.  I judged 13 characters
> (corresponding to 2 hours) to be a reasonable point at which our
> alignment could start to degrade and this provides a more compact output
> for the majority of cases (checked by triggering the error case via
> GDB).
> 
> ------------------------------
> N.b. I use a literal 1000000000 for "NANOSEC_PER_SEC".  I believe this
> would fit in an integer on all hosts that GCC supports, but am not
> certain there are not strange integer sizes we support hence am pointing
> it out for special attention during review.
> 
> ------------------------------
> No expected change in generated code.
> Bootstrapped and regtested on AArch64 with no regressions.
> Manually checked that flaky test is no longer flaky on the machine it
> was seen before.

The patch looks OK to me if it passes bootstrap / regtest and the
output of -ftime-report doesn't change (too much).

Thanks,
Richard.

> gcc/ChangeLog:
> 
>       PR middle-end/110316
>       PR middle-end/9903
>       * timevar.cc (NANOSEC_PER_SEC, TICKS_TO_NANOSEC,
>       CLOCKS_TO_NANOSEC, nanosec_to_floating_sec, percent_of): New.
>       (TICKS_TO_MSEC, CLOCKS_TO_MSEC): Remove these macros.
>       (timer::validate_phases): Use integral arithmetic to check
>       validity.
>       (timer::print_row, timer::print): Convert from integral
>       nanoseconds to floating point seconds before printing.
>       (timer::all_zero): Change limit to nanosec count instead of
>       fractional count of seconds.
>       * timevar.h (struct timevar_time_def): Update all measurements
>       to use uint64_t nanoseconds rather than seconds stored in a
>       double.
> 
> 
> 
> ###############     Attachment also inlined for ease of reply    
> ###############
> 
> 
> diff --git a/gcc/timevar.h b/gcc/timevar.h
> index 
> ad46573160957c6461190204c5181ec9e904e204..882954a527e11a90873f25a45e9073149709dd05
>  100644
> --- a/gcc/timevar.h
> +++ b/gcc/timevar.h
> @@ -44,7 +44,7 @@
>  */
>  
>  /* This structure stores the various varieties of time that can be
> -   measured.  Times are stored in seconds.  The time may be an
> +   measured.  Times are stored in nanoseconds.  The time may be an
>     absolute time or a time difference; in the former case, the time
>     base is undefined, except that the difference between two times
>     produces a valid time difference.  */
> @@ -52,14 +52,13 @@
>  struct timevar_time_def
>  {
>    /* User time in this process.  */
> -  double user;
> +  uint64_t user;
>  
> -  /* System time (if applicable for this host platform) in this
> -     process.  */
> -  double sys;
> +  /* System time (if applicable for this host platform) in this process.  */
> +  uint64_t sys;
>  
>    /* Wall clock time.  */
> -  double wall;
> +  uint64_t wall;
>  
>    /* Garbage collector memory.  */
>    size_t ggc_mem;
> diff --git a/gcc/timevar.cc b/gcc/timevar.cc
> index 
> d695297aae7f6b2a6de01a37fe86c2a232338df0..c6094b1c336c30e5a404ff8fd5c79795edbd59f7
>  100644
> --- a/gcc/timevar.cc
> +++ b/gcc/timevar.cc
> @@ -42,7 +42,7 @@ struct tms
>  # define RUSAGE_SELF 0
>  #endif
>  
> -/* Calculation of scale factor to convert ticks to microseconds.
> +/* Calculation of scale factor to convert ticks to seconds.
>     We mustn't use CLOCKS_PER_SEC except with clock().  */
>  #if HAVE_SYSCONF && defined _SC_CLK_TCK
>  # define TICKS_PER_SECOND sysconf (_SC_CLK_TCK) /* POSIX 1003.1-1996 */
> @@ -91,14 +91,15 @@ struct tms
>     the underlying constants, and that can be very slow, so we have to
>     precompute them.  Whose wonderful idea was it to make all those
>     _constants_ variable at run time, anyway?  */
> +#define NANOSEC_PER_SEC 1000000000
>  #ifdef USE_TIMES
> -static double ticks_to_msec;
> -#define TICKS_TO_MSEC (1 / (double)TICKS_PER_SECOND)
> +static uint64_t ticks_to_nanosec;
> +#define TICKS_TO_NANOSEC (NANOSEC_PER_SEC / TICKS_PER_SECOND)
>  #endif
>  
>  #ifdef USE_CLOCK
> -static double clocks_to_msec;
> -#define CLOCKS_TO_MSEC (1 / (double)CLOCKS_PER_SEC)
> +static uint64_t clocks_to_nanosec;
> +#define CLOCKS_TO_NANOSEC (NANOSEC_PER_SEC / CLOCKS_PER_SEC)
>  #endif
>  
>  /* Non-NULL if timevars should be used.  In GCC, this happens with
> @@ -222,18 +223,20 @@ get_time (struct timevar_time_def *now)
>    {
>  #ifdef USE_TIMES
>      struct tms tms;
> -    now->wall = times (&tms)  * ticks_to_msec;
> -    now->user = tms.tms_utime * ticks_to_msec;
> -    now->sys  = tms.tms_stime * ticks_to_msec;
> +    now->wall = times (&tms)  * ticks_to_nanosec;
> +    now->user = tms.tms_utime * ticks_to_nanosec;
> +    now->sys  = tms.tms_stime * ticks_to_nanosec;
>  #endif
>  #ifdef USE_GETRUSAGE
>      struct rusage rusage;
>      getrusage (RUSAGE_SELF, &rusage);
> -    now->user = rusage.ru_utime.tv_sec + rusage.ru_utime.tv_usec * 1e-6;
> -    now->sys  = rusage.ru_stime.tv_sec + rusage.ru_stime.tv_usec * 1e-6;
> +    now->user = rusage.ru_utime.tv_sec * NANOSEC_PER_SEC
> +             + rusage.ru_utime.tv_usec * 1000;
> +    now->sys  = rusage.ru_stime.tv_sec * NANOSEC_PER_SEC
> +             + rusage.ru_stime.tv_usec * 1000;
>  #endif
>  #ifdef USE_CLOCK
> -    now->user = clock () * clocks_to_msec;
> +    now->user = clock () * clocks_to_nanosec;
>  #endif
>    }
>  }
> @@ -271,10 +274,10 @@ timer::timer () :
>    /* Initialize configuration-specific state.
>       Ideally this would be one-time initialization.  */
>  #ifdef USE_TIMES
> -  ticks_to_msec = TICKS_TO_MSEC;
> +  ticks_to_nanosec = TICKS_TO_NANOSEC;
>  #endif
>  #ifdef USE_CLOCK
> -  clocks_to_msec = CLOCKS_TO_MSEC;
> +  clocks_to_nanosec = CLOCKS_TO_NANOSEC;
>  #endif
>  }
>  
> @@ -583,12 +586,11 @@ timer::validate_phases (FILE *fp) const
>  {
>    unsigned int /* timevar_id_t */ id;
>    const timevar_time_def *total = &m_timevars[TV_TOTAL].elapsed;
> -  double phase_user = 0.0;
> -  double phase_sys = 0.0;
> -  double phase_wall = 0.0;
> +  uint64_t phase_user = 0;
> +  uint64_t phase_sys = 0;
> +  uint64_t phase_wall = 0;
>    size_t phase_ggc_mem = 0;
>    static char phase_prefix[] = "phase ";
> -  const double tolerance = 1.000001;  /* One part in a million.  */
>  
>    for (id = 0; id < (unsigned int) TIMEVAR_LAST; ++id)
>      {
> @@ -607,26 +609,32 @@ timer::validate_phases (FILE *fp) const
>       }
>      }
>  
> -  if (phase_user > total->user * tolerance
> -      || phase_sys > total->sys * tolerance
> -      || phase_wall > total->wall * tolerance
> -      || phase_ggc_mem > total->ggc_mem * tolerance)
> +  if (phase_user > total->user
> +      || phase_sys > total->sys
> +      || phase_wall > total->wall
> +      || phase_ggc_mem > total->ggc_mem)
>      {
>  
>        fprintf (fp, "Timing error: total of phase timers exceeds total 
> time.\n");
>        if (phase_user > total->user)
> -     fprintf (fp, "user    %24.18e > %24.18e\n", phase_user, total->user);
> +     fprintf (fp, "user    %13" PRIu64 " > %13" PRIu64 "\n",
> +              phase_user, total->user);
>        if (phase_sys > total->sys)
> -     fprintf (fp, "sys     %24.18e > %24.18e\n", phase_sys, total->sys);
> +     fprintf (fp, "sys     %13" PRIu64 " > %13" PRIu64 "\n",
> +              phase_sys, total->sys);
>        if (phase_wall > total->wall)
> -     fprintf (fp, "wall    %24.18e > %24.18e\n", phase_wall, total->wall);
> +     fprintf (fp, "wall    %13" PRIu64 " > %13" PRIu64 "\n",
> +              phase_wall, total->wall);
>        if (phase_ggc_mem > total->ggc_mem)
> -     fprintf (fp, "ggc_mem %24lu > %24lu\n", (unsigned long)phase_ggc_mem,
> +     fprintf (fp, "ggc_mem %13lu > %13lu\n", (unsigned long)phase_ggc_mem,
>                (unsigned long)total->ggc_mem);
>        gcc_unreachable ();
>      }
>  }
>  
> +#define nanosec_to_floating_sec(NANO) ((double)(NANO) * 1e-9)
> +#define percent_of(TOTAL, SUBTOTAL) \
> +  ((TOTAL) == 0 ? 0 : ((double)SUBTOTAL / TOTAL) * 100)
>  /* Helper function for timer::print.  */
>  void
> @@ -640,22 +648,22 @@ timer::print_row (FILE *fp,
>  #ifdef HAVE_USER_TIME
>    /* Print user-mode time for this process.  */
>    fprintf (fp, "%7.2f (%3.0f%%)",
> -        elapsed.user,
> -        (total->user == 0 ? 0 : elapsed.user / total->user) * 100);
> +        nanosec_to_floating_sec (elapsed.user),
> +        percent_of (total->user, elapsed.user));
>  #endif /* HAVE_USER_TIME */
>  
>  #ifdef HAVE_SYS_TIME
>    /* Print system-mode time for this process.  */
>    fprintf (fp, "%7.2f (%3.0f%%)",
> -        elapsed.sys,
> -        (total->sys == 0 ? 0 : elapsed.sys / total->sys) * 100);
> +        nanosec_to_floating_sec (elapsed.sys),
> +        percent_of (total->sys, elapsed.sys));
>  #endif /* HAVE_SYS_TIME */
>  
>  #ifdef HAVE_WALL_TIME
>    /* Print wall clock time elapsed.  */
>    fprintf (fp, "%7.2f (%3.0f%%)",
> -        elapsed.wall,
> -        (total->wall == 0 ? 0 : elapsed.wall / total->wall) * 100);
> +        nanosec_to_floating_sec (elapsed.wall),
> +        percent_of (total->wall, elapsed.wall));
>  #endif /* HAVE_WALL_TIME */
>  
>    /* Print the amount of ggc memory allocated.  */
> @@ -673,7 +681,8 @@ timer::print_row (FILE *fp,
>  bool
>  timer::all_zero (const timevar_time_def &elapsed)
>  {
> -  const double tiny = 5e-3;
> +  /* 5000000 nanosec == 5e-3 seconds.  */
> +  uint64_t tiny = 5000000;
>    return (elapsed.user < tiny
>         && elapsed.sys < tiny
>         && elapsed.wall < tiny
> @@ -766,13 +775,13 @@ timer::print (FILE *fp)
>    /* Print total time.  */
>    fprintf (fp, " %-35s:", "TOTAL");
>  #ifdef HAVE_USER_TIME
> -  fprintf (fp, "%7.2f      ", total->user);
> +  fprintf (fp, "%7.2f      ", nanosec_to_floating_sec (total->user));
>  #endif
>  #ifdef HAVE_SYS_TIME
> -  fprintf (fp, "%8.2f      ", total->sys);
> +  fprintf (fp, "%8.2f      ", nanosec_to_floating_sec (total->sys));
>  #endif
>  #ifdef HAVE_WALL_TIME
> -  fprintf (fp, "%8.2f      ", total->wall);
> +  fprintf (fp, "%8.2f      ", nanosec_to_floating_sec (total->wall));
>  #endif
>    fprintf (fp, PRsa (7) "\n", SIZE_AMOUNT (total->ggc_mem));
>  
> @@ -790,6 +799,8 @@ timer::print (FILE *fp)
>  
>    validate_phases (fp);
>  }
> +#undef nanosec_to_floating_sec
> +#undef percent_of
>  
>  /* Get the name of the topmost item.  For use by jit for validating
>     inputs to gcc_jit_timer_pop.  */
> 
> 
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to