On Wednesday, 26 February 2020 at 00:50:35 UTC, Basile B. wrote:
So after reading the translation of RYU I was interested too
see if the decimalLength() function can be written to be
faster, as it cascades up to 8 CMP.
...
Then bad surprise. Even with ldmd (so ldc2 basically) feeded
with the args from the script line. Maybe the fdecimalLength9
version is slightly faster. Only *slightly*. Good news, I've
lost my time. So I try an alternative version that uses a table
of delegates instead of a switch (ffdecimalLength9) and
surprise, "tada", it is like **100x** slower then the two
others.
How is that possible ?
Hi Basile,
I recently saw this presentation:
https://www.youtube.com/watch?v=Czr5dBfs72U
It has some ideas that may help you make sure your measurements
are good and may give you ideas to find the performance
bottleneck or where to optimize.
llvm-mca is featured on godbolt.org:
https://mca.godbolt.org/z/YWp3yv
cheers,
Johan