On Fri, Sep 27, 2024 at 02:50:13PM +1200, David Rowley wrote: > I had been looking at [1] (which I've added your version to now). I > had been surprised to see gcc emitting different code for the first 3 > versions. Clang does a better job at figuring out they all do the same > thing and emitting the same code for each.
Interesting. > I played around with the attached (hacked up) qsort.c to see if there > was any difference. Likely function call overhead kills the > performance anyway. There does not seem to be much difference between > them. I've not tested with an inlined comparison function. I'd expect worse performance with the branchless routines for the inlined case. However, I recall that clang was able to optimize med3() as well as it can with the branching routines, so that may not always be true. > Looking at your version, it doesn't look like there's any sort of > improvement in terms of the instructions. Certainly, for clang, it's > worse as it adds a shift left instruction and an additional compare. > No jumps, at least. I think I may have forgotten to add -O2 when I was inspecting this code with godbolt.org earlier. *facepalm* The different versions look pretty comparable with that added. > What's your reasoning for returning INT_MIN and INT_MAX? That's just for the compile option added by commit c87cb5f, which IIUC is intended to test that we correctly handle comparisons that return INT_MIN. -- nathan