On Thu, 3 Aug 2023, Matthew Malcomson wrote: > > > > I think this is undesriable. With fused you mean we use FMA? > > I think you could use -ffp-contract=off for the TU instead. > > > > Note you can't use __attribute__((noinline)) literally since the > > host compiler might not support this. > > > > Richard. > > > > > Trying to make the timevar store integral time intervals. > Hope this is acceptable -- I had originally planned to use > `-ffp-contract` as agreed until I saw the email mentioning the old x86 > bug in the same area which was not to do with floating point contraction > of operations (PR 99903) and figured it would be better to try and solve > both at the same time while making things in general a bit more robust. > ________________________________________________________________________ > > > On some AArch64 bootstrapped builds, we were getting a flaky test > because the floating point operations in `get_time` were being fused > with the floating point operations in `timevar_accumulate`. > > This meant that the rounding behaviour of our multiplication with > `ticks_to_msec` was different when used in `timer::start` and when > performed in `timer::stop`. These extra inaccuracies led to the > testcase `g++.dg/ext/timevar1.C` being flaky on some hardware. > > ------------------------------ > Avoiding the inlining which was agreed to be undesirable. Three > alternative approaches: > 1) Use `-ffp-contract=on` to avoid this particular optimisation. > 2) Adjusting the code so that the "tolerance" is always of the order of > a "tick". > 3) Recording times and elapsed differences in integral values. > - Could be in terms of a standard measurement (e.g. nanoseconds or > microseconds). > - Could be in terms of whatever integral value ("ticks" / > secondsµseconds / "clock ticks") is returned from the syscall > chosen at configure time. > > While `-ffp-contract=on` removes the problem that I bumped into, there > has been a similar bug on x86 that was to do with a different floating > point problem that also happens after `get_time` and > `timevar_accumulate` both being inlined into the same function. Hence > it seems worth choosing a different approach. > > Of the two other solutions, recording measurements in integral values > seems the most robust against slightly "off" measurements being > presented to the user -- even though it could avoid the ICE that creates > a flaky test. > > I considered storing time in whatever units our syscall returns and > normalising them at the time we print out rather than normalising them > to nanoseconds at the point we record our "current time". The logic > being that normalisation could have some rounding affect (e.g. if > TICKS_PER_SECOND is 3) that would be taken into account in calculations. > > I decided against it in order to give the values recorded in > `timevar_time_def` some interpretive value so it's easier to read the > code. Compared to the small rounding that would represent a tiny amount > of time and AIUI can not trigger the same kind of ICE's as we are > attempting to fix, said interpretive value seems more valuable. > > Recording time in microseconds seemed reasonable since all obvious > values for ticks and `getrusage` are at microsecond granularity or less > precise. That said, since TICKS_PER_SECOND and CLOCKS_PER_SEC are both > variables given to use by the host system I was not sure of that enough > to make this decision. > > ------------------------------ > timer::all_zero is ignoring rows which are inconsequential to the user > and would be printed out as all zeros. Since upon printing rows we > convert to the same double value and print out the same precision as > before, we return true/false based on the same amount of time as before. > > timer::print_row casts to a floating point measurement in units of > seconds as was printed out before. > > timer::validate_phases -- I'm printing out nanoseconds here rather than > floating point seconds since this is an error message for when things > have "gone wrong" printing out the actual nanoseconds that have been > recorded seems like the best approach. > N.b. since we now print out nanoseconds instead of floating point value > the padding requirements are different. Originally we were padding to > 24 characters and printing 18 decimal places. This looked odd with the > now visually smaller values getting printed. I judged 13 characters > (corresponding to 2 hours) to be a reasonable point at which our > alignment could start to degrade and this provides a more compact output > for the majority of cases (checked by triggering the error case via > GDB). > > ------------------------------ > N.b. I use a literal 1000000000 for "NANOSEC_PER_SEC". I believe this > would fit in an integer on all hosts that GCC supports, but am not > certain there are not strange integer sizes we support hence am pointing > it out for special attention during review. > > ------------------------------ > No expected change in generated code. > Bootstrapped and regtested on AArch64 with no regressions. > Manually checked that flaky test is no longer flaky on the machine it > was seen before.
The patch looks OK to me if it passes bootstrap / regtest and the output of -ftime-report doesn't change (too much). Thanks, Richard. > gcc/ChangeLog: > > PR middle-end/110316 > PR middle-end/9903 > * timevar.cc (NANOSEC_PER_SEC, TICKS_TO_NANOSEC, > CLOCKS_TO_NANOSEC, nanosec_to_floating_sec, percent_of): New. > (TICKS_TO_MSEC, CLOCKS_TO_MSEC): Remove these macros. > (timer::validate_phases): Use integral arithmetic to check > validity. > (timer::print_row, timer::print): Convert from integral > nanoseconds to floating point seconds before printing. > (timer::all_zero): Change limit to nanosec count instead of > fractional count of seconds. > * timevar.h (struct timevar_time_def): Update all measurements > to use uint64_t nanoseconds rather than seconds stored in a > double. > > > > ############### Attachment also inlined for ease of reply > ############### > > > diff --git a/gcc/timevar.h b/gcc/timevar.h > index > ad46573160957c6461190204c5181ec9e904e204..882954a527e11a90873f25a45e9073149709dd05 > 100644 > --- a/gcc/timevar.h > +++ b/gcc/timevar.h > @@ -44,7 +44,7 @@ > */ > > /* This structure stores the various varieties of time that can be > - measured. Times are stored in seconds. The time may be an > + measured. Times are stored in nanoseconds. The time may be an > absolute time or a time difference; in the former case, the time > base is undefined, except that the difference between two times > produces a valid time difference. */ > @@ -52,14 +52,13 @@ > struct timevar_time_def > { > /* User time in this process. */ > - double user; > + uint64_t user; > > - /* System time (if applicable for this host platform) in this > - process. */ > - double sys; > + /* System time (if applicable for this host platform) in this process. */ > + uint64_t sys; > > /* Wall clock time. */ > - double wall; > + uint64_t wall; > > /* Garbage collector memory. */ > size_t ggc_mem; > diff --git a/gcc/timevar.cc b/gcc/timevar.cc > index > d695297aae7f6b2a6de01a37fe86c2a232338df0..c6094b1c336c30e5a404ff8fd5c79795edbd59f7 > 100644 > --- a/gcc/timevar.cc > +++ b/gcc/timevar.cc > @@ -42,7 +42,7 @@ struct tms > # define RUSAGE_SELF 0 > #endif > > -/* Calculation of scale factor to convert ticks to microseconds. > +/* Calculation of scale factor to convert ticks to seconds. > We mustn't use CLOCKS_PER_SEC except with clock(). */ > #if HAVE_SYSCONF && defined _SC_CLK_TCK > # define TICKS_PER_SECOND sysconf (_SC_CLK_TCK) /* POSIX 1003.1-1996 */ > @@ -91,14 +91,15 @@ struct tms > the underlying constants, and that can be very slow, so we have to > precompute them. Whose wonderful idea was it to make all those > _constants_ variable at run time, anyway? */ > +#define NANOSEC_PER_SEC 1000000000 > #ifdef USE_TIMES > -static double ticks_to_msec; > -#define TICKS_TO_MSEC (1 / (double)TICKS_PER_SECOND) > +static uint64_t ticks_to_nanosec; > +#define TICKS_TO_NANOSEC (NANOSEC_PER_SEC / TICKS_PER_SECOND) > #endif > > #ifdef USE_CLOCK > -static double clocks_to_msec; > -#define CLOCKS_TO_MSEC (1 / (double)CLOCKS_PER_SEC) > +static uint64_t clocks_to_nanosec; > +#define CLOCKS_TO_NANOSEC (NANOSEC_PER_SEC / CLOCKS_PER_SEC) > #endif > > /* Non-NULL if timevars should be used. In GCC, this happens with > @@ -222,18 +223,20 @@ get_time (struct timevar_time_def *now) > { > #ifdef USE_TIMES > struct tms tms; > - now->wall = times (&tms) * ticks_to_msec; > - now->user = tms.tms_utime * ticks_to_msec; > - now->sys = tms.tms_stime * ticks_to_msec; > + now->wall = times (&tms) * ticks_to_nanosec; > + now->user = tms.tms_utime * ticks_to_nanosec; > + now->sys = tms.tms_stime * ticks_to_nanosec; > #endif > #ifdef USE_GETRUSAGE > struct rusage rusage; > getrusage (RUSAGE_SELF, &rusage); > - now->user = rusage.ru_utime.tv_sec + rusage.ru_utime.tv_usec * 1e-6; > - now->sys = rusage.ru_stime.tv_sec + rusage.ru_stime.tv_usec * 1e-6; > + now->user = rusage.ru_utime.tv_sec * NANOSEC_PER_SEC > + + rusage.ru_utime.tv_usec * 1000; > + now->sys = rusage.ru_stime.tv_sec * NANOSEC_PER_SEC > + + rusage.ru_stime.tv_usec * 1000; > #endif > #ifdef USE_CLOCK > - now->user = clock () * clocks_to_msec; > + now->user = clock () * clocks_to_nanosec; > #endif > } > } > @@ -271,10 +274,10 @@ timer::timer () : > /* Initialize configuration-specific state. > Ideally this would be one-time initialization. */ > #ifdef USE_TIMES > - ticks_to_msec = TICKS_TO_MSEC; > + ticks_to_nanosec = TICKS_TO_NANOSEC; > #endif > #ifdef USE_CLOCK > - clocks_to_msec = CLOCKS_TO_MSEC; > + clocks_to_nanosec = CLOCKS_TO_NANOSEC; > #endif > } > > @@ -583,12 +586,11 @@ timer::validate_phases (FILE *fp) const > { > unsigned int /* timevar_id_t */ id; > const timevar_time_def *total = &m_timevars[TV_TOTAL].elapsed; > - double phase_user = 0.0; > - double phase_sys = 0.0; > - double phase_wall = 0.0; > + uint64_t phase_user = 0; > + uint64_t phase_sys = 0; > + uint64_t phase_wall = 0; > size_t phase_ggc_mem = 0; > static char phase_prefix[] = "phase "; > - const double tolerance = 1.000001; /* One part in a million. */ > > for (id = 0; id < (unsigned int) TIMEVAR_LAST; ++id) > { > @@ -607,26 +609,32 @@ timer::validate_phases (FILE *fp) const > } > } > > - if (phase_user > total->user * tolerance > - || phase_sys > total->sys * tolerance > - || phase_wall > total->wall * tolerance > - || phase_ggc_mem > total->ggc_mem * tolerance) > + if (phase_user > total->user > + || phase_sys > total->sys > + || phase_wall > total->wall > + || phase_ggc_mem > total->ggc_mem) > { > > fprintf (fp, "Timing error: total of phase timers exceeds total > time.\n"); > if (phase_user > total->user) > - fprintf (fp, "user %24.18e > %24.18e\n", phase_user, total->user); > + fprintf (fp, "user %13" PRIu64 " > %13" PRIu64 "\n", > + phase_user, total->user); > if (phase_sys > total->sys) > - fprintf (fp, "sys %24.18e > %24.18e\n", phase_sys, total->sys); > + fprintf (fp, "sys %13" PRIu64 " > %13" PRIu64 "\n", > + phase_sys, total->sys); > if (phase_wall > total->wall) > - fprintf (fp, "wall %24.18e > %24.18e\n", phase_wall, total->wall); > + fprintf (fp, "wall %13" PRIu64 " > %13" PRIu64 "\n", > + phase_wall, total->wall); > if (phase_ggc_mem > total->ggc_mem) > - fprintf (fp, "ggc_mem %24lu > %24lu\n", (unsigned long)phase_ggc_mem, > + fprintf (fp, "ggc_mem %13lu > %13lu\n", (unsigned long)phase_ggc_mem, > (unsigned long)total->ggc_mem); > gcc_unreachable (); > } > } > > +#define nanosec_to_floating_sec(NANO) ((double)(NANO) * 1e-9) > +#define percent_of(TOTAL, SUBTOTAL) \ > + ((TOTAL) == 0 ? 0 : ((double)SUBTOTAL / TOTAL) * 100) > /* Helper function for timer::print. */ > void > @@ -640,22 +648,22 @@ timer::print_row (FILE *fp, > #ifdef HAVE_USER_TIME > /* Print user-mode time for this process. */ > fprintf (fp, "%7.2f (%3.0f%%)", > - elapsed.user, > - (total->user == 0 ? 0 : elapsed.user / total->user) * 100); > + nanosec_to_floating_sec (elapsed.user), > + percent_of (total->user, elapsed.user)); > #endif /* HAVE_USER_TIME */ > > #ifdef HAVE_SYS_TIME > /* Print system-mode time for this process. */ > fprintf (fp, "%7.2f (%3.0f%%)", > - elapsed.sys, > - (total->sys == 0 ? 0 : elapsed.sys / total->sys) * 100); > + nanosec_to_floating_sec (elapsed.sys), > + percent_of (total->sys, elapsed.sys)); > #endif /* HAVE_SYS_TIME */ > > #ifdef HAVE_WALL_TIME > /* Print wall clock time elapsed. */ > fprintf (fp, "%7.2f (%3.0f%%)", > - elapsed.wall, > - (total->wall == 0 ? 0 : elapsed.wall / total->wall) * 100); > + nanosec_to_floating_sec (elapsed.wall), > + percent_of (total->wall, elapsed.wall)); > #endif /* HAVE_WALL_TIME */ > > /* Print the amount of ggc memory allocated. */ > @@ -673,7 +681,8 @@ timer::print_row (FILE *fp, > bool > timer::all_zero (const timevar_time_def &elapsed) > { > - const double tiny = 5e-3; > + /* 5000000 nanosec == 5e-3 seconds. */ > + uint64_t tiny = 5000000; > return (elapsed.user < tiny > && elapsed.sys < tiny > && elapsed.wall < tiny > @@ -766,13 +775,13 @@ timer::print (FILE *fp) > /* Print total time. */ > fprintf (fp, " %-35s:", "TOTAL"); > #ifdef HAVE_USER_TIME > - fprintf (fp, "%7.2f ", total->user); > + fprintf (fp, "%7.2f ", nanosec_to_floating_sec (total->user)); > #endif > #ifdef HAVE_SYS_TIME > - fprintf (fp, "%8.2f ", total->sys); > + fprintf (fp, "%8.2f ", nanosec_to_floating_sec (total->sys)); > #endif > #ifdef HAVE_WALL_TIME > - fprintf (fp, "%8.2f ", total->wall); > + fprintf (fp, "%8.2f ", nanosec_to_floating_sec (total->wall)); > #endif > fprintf (fp, PRsa (7) "\n", SIZE_AMOUNT (total->ggc_mem)); > > @@ -790,6 +799,8 @@ timer::print (FILE *fp) > > validate_phases (fp); > } > +#undef nanosec_to_floating_sec > +#undef percent_of > > /* Get the name of the topmost item. For use by jit for validating > inputs to gcc_jit_timer_pop. */ > > > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)