On Wed, Feb 26, 2020 at 12:50:35AM +0000, Basile B. via Digitalmars-d-learn wrote: [...] > #!dmd -boundscheck=off -O -release -inline [...]
TBH, I'm skeptical of any performance results using dmd. I wouldn't pay attention to performance numbers obtained this way, and rather look at the ldmd/ldc2 numbers. [...] > Then bad surprise. Even with ldmd (so ldc2 basically) feeded with the > args from the script line. Maybe the fdecimalLength9 version is > slightly faster. Only *slightly*. Good news, I've lost my time. So I > try an alternative version that uses a table of delegates instead of a > switch (ffdecimalLength9) and surprise, "tada", it is like **100x** > slower then the two others. > > How is that possible ? Did you check the assembly output to see what the difference is? Delegates involve a function call, which involves function call overhead, which includes a CPU pipeline hazard. Worse yet it's an indirect call, meaning you're defeating the CPU branch predictor and invalidating the instruction cache. And on top of that, delegates involve allocating a context, and you *really* do not want allocations inside an inner loop. And of course, normal function calls are easier for compilers to inline, because the destination is fixed. Indirect calls involving delegates are hard to predict, and the optimizer is more liable to just give up. These are just my educated guesses, of course. For the real answer, look at the assembly output. :-D T -- What are you when you run out of Monet? Baroque.