On 01/03/2023 12:25, J. Gareth Moreton via fpc-devel wrote:
My peephole optimisations mostly save only a handful of cycles each
time which probably won't add up to much for a relatively short test.
The most major optimisation I can think of, although I'm not quite
sure when it was merged, is the method of replacing divisions by a
constant with an equivalent reciprocal multiplication. You'll see the
biggest savings there. There's other difficulties like processors
being intelligent with caching and out of order execution, for
example, that are disguising some inefficiencies. And some seek only
to reduce code size with no loss of speed.
What are your timings like when compiling with COREAVX or COREAVX2? A
couple of recent peephole optimizations make use of BMI1 and BMI2.
I had -CpCOREAVX2 supplied. (my fpc is a good week old, so if recent is
less than that...)
I don't have many divisions in that code.
Most of the good is going through big data in memory. So its possible
that any gained processing speed, just has to wait for data to be fetched.
I can't remember the proverb that Florian used, but it essentially
boils down to very small changes, individually not amounting to much,
but which accumulate into a noticable difference when in large numbers.
Hence testing back to 3.2.3 ( unfortunately 3.2.2 has a bug that
matters in this code)
Also, I didn't expect any huge diffs, just wanted to see if anything can
be noted at all. (and if lucky, in that test I run)
I did a test on a more limited scope (testing only a handful of
functions. That test runs 4 Min 20 sec under 3.2.3.
And 2 extra seconds with 3.3.1. But then I only had 2 sample runs for
each fpc version....
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel