On Thu, 2023-08-03 at 14:38 +0100, Matthew Malcomson via Gcc-patches wrote: > > > > I think this is undesriable. With fused you mean we use FMA? > > I think you could use -ffp-contract=off for the TU instead. > > > > Note you can't use __attribute__((noinline)) literally since the > > host compiler might not support this. > > > > Richard. > > > > > Trying to make the timevar store integral time intervals. > Hope this is acceptable -- I had originally planned to use > `-ffp-contract` as agreed until I saw the email mentioning the old > x86 > bug in the same area which was not to do with floating point > contraction > of operations (PR 99903) and figured it would be better to try and > solve > both at the same time while making things in general a bit more > robust. > _____________________________________________________________________ > ___ > > > On some AArch64 bootstrapped builds, we were getting a flaky test > because the floating point operations in `get_time` were being fused > with the floating point operations in `timevar_accumulate`. > > This meant that the rounding behaviour of our multiplication with > `ticks_to_msec` was different when used in `timer::start` and when > performed in `timer::stop`. These extra inaccuracies led to the > testcase `g++.dg/ext/timevar1.C` being flaky on some hardware. > > ------------------------------ > Avoiding the inlining which was agreed to be undesirable. Three > alternative approaches: > 1) Use `-ffp-contract=on` to avoid this particular optimisation. > 2) Adjusting the code so that the "tolerance" is always of the order > of > a "tick". > 3) Recording times and elapsed differences in integral values. > - Could be in terms of a standard measurement (e.g. nanoseconds or > microseconds). > - Could be in terms of whatever integral value ("ticks" / > secondsµseconds / "clock ticks") is returned from the > syscall > chosen at configure time. > > While `-ffp-contract=on` removes the problem that I bumped into, > there > has been a similar bug on x86 that was to do with a different > floating > point problem that also happens after `get_time` and > `timevar_accumulate` both being inlined into the same function. > Hence > it seems worth choosing a different approach. > > Of the two other solutions, recording measurements in integral values > seems the most robust against slightly "off" measurements being > presented to the user -- even though it could avoid the ICE that > creates > a flaky test. > > I considered storing time in whatever units our syscall returns and > normalising them at the time we print out rather than normalising > them > to nanoseconds at the point we record our "current time". The logic > being that normalisation could have some rounding affect (e.g. if > TICKS_PER_SECOND is 3) that would be taken into account in > calculations. > > I decided against it in order to give the values recorded in > `timevar_time_def` some interpretive value so it's easier to read the > code. Compared to the small rounding that would represent a tiny > amount > of time and AIUI can not trigger the same kind of ICE's as we are > attempting to fix, said interpretive value seems more valuable. > > Recording time in microseconds seemed reasonable since all obvious > values for ticks and `getrusage` are at microsecond granularity or > less > precise. That said, since TICKS_PER_SECOND and CLOCKS_PER_SEC are > both > variables given to use by the host system I was not sure of that > enough > to make this decision. > > ------------------------------ > timer::all_zero is ignoring rows which are inconsequential to the > user > and would be printed out as all zeros. Since upon printing rows we > convert to the same double value and print out the same precision as > before, we return true/false based on the same amount of time as > before. > > timer::print_row casts to a floating point measurement in units of > seconds as was printed out before. > > timer::validate_phases -- I'm printing out nanoseconds here rather > than > floating point seconds since this is an error message for when things > have "gone wrong" printing out the actual nanoseconds that have been > recorded seems like the best approach. > N.b. since we now print out nanoseconds instead of floating point > value > the padding requirements are different. Originally we were padding > to > 24 characters and printing 18 decimal places. This looked odd with > the > now visually smaller values getting printed. I judged 13 characters > (corresponding to 2 hours) to be a reasonable point at which our > alignment could start to degrade and this provides a more compact > output > for the majority of cases (checked by triggering the error case via > GDB). > > ------------------------------ > N.b. I use a literal 1000000000 for "NANOSEC_PER_SEC". I believe > this > would fit in an integer on all hosts that GCC supports, but am not > certain there are not strange integer sizes we support hence am > pointing > it out for special attention during review. > > ------------------------------ > No expected change in generated code. > Bootstrapped and regtested on AArch64 with no regressions. > Manually checked that flaky test is no longer flaky on the machine it > was seen before. > > gcc/ChangeLog: > > PR middle-end/110316 > PR middle-end/9903 > * timevar.cc (NANOSEC_PER_SEC, TICKS_TO_NANOSEC, > CLOCKS_TO_NANOSEC, nanosec_to_floating_sec, percent_of): New. > (TICKS_TO_MSEC, CLOCKS_TO_MSEC): Remove these macros. > (timer::validate_phases): Use integral arithmetic to check > validity. > (timer::print_row, timer::print): Convert from integral > nanoseconds to floating point seconds before printing. > (timer::all_zero): Change limit to nanosec count instead of > fractional count of seconds. > * timevar.h (struct timevar_time_def): Update all > measurements > to use uint64_t nanoseconds rather than seconds stored in a > double. > > > > ############### Attachment also inlined for ease of reply > ############### > > > diff --git a/gcc/timevar.h b/gcc/timevar.h > index > ad46573160957c6461190204c5181ec9e904e204..882954a527e11a90873f25a45e9 > 073149709dd05 100644 > --- a/gcc/timevar.h > +++ b/gcc/timevar.h > @@ -44,7 +44,7 @@ > */ > > /* This structure stores the various varieties of time that can be > - measured. Times are stored in seconds. The time may be an > + measured. Times are stored in nanoseconds. The time may be an > absolute time or a time difference; in the former case, the time > base is undefined, except that the difference between two times > produces a valid time difference. */ > @@ -52,14 +52,13 @@ > struct timevar_time_def > { > /* User time in this process. */ > - double user; > + uint64_t user; > > - /* System time (if applicable for this host platform) in this > - process. */ > - double sys; > + /* System time (if applicable for this host platform) in this > process. */ > + uint64_t sys; > > /* Wall clock time. */ > - double wall; > + uint64_t wall; > > /* Garbage collector memory. */ > size_t ggc_mem;
Hi Matthew. I recently touched the timevar code (in r14-2881- g75d623946d4b6e) to add support for serializing the timevar data in JSON form as part of the SARIF output (PR analyzer/109361). Looking at your patch, it looks like the baseline for the patch seems to predate r14-2881-g75d623946d4b6e. I don't have a strong opinion on the implementation choices in your patch, but please can you rebase to beyond my recent change and make sure that the SARIF serialization still works with your patch. Specifically, please try compiling with -ftime-report -fdiagnostics-format=sarif-file and have a look at the generated .sarif file, e.g. via python -m json.tool foo.c.sarif which will pretty-print the JSON to stdout. Currently I'm writing out the values as floating-point seconds, and AFAIK my analyzer integration testsuite [1] is the only consumer of this data. [...snip...] Thanks Dave [1] https://github.com/davidmalcolm/gcc-analyzer-integration-tests/issues/5