I wrote: > "Richard Biener" <richard.guent...@gmail.com> wrote:
[...] >> Whether or not the branch is predicted taken does not matter, what >> matters is that the continuation is not data dependent on the branch >> target computation and thus can execute in parallel to it. > > My benchmark shows that this doesn't matter! >>> I already presented measured numbers: with random data, the branch-free >>> code is faster, with ordered data the original code. [...] >> We do know that mispredicted branches are bad. >> You show well-predicted branches are good. > > Wrong: I show that no branches are still better. > >> By simple statistics singling out 4 out of 255 values will make the >> branches well-predicted. > > Your statistic is wrong: > 1. the branch singles out 224 of 256 values, i.e. 7/8 of all data; > 2. whitespace lies in the 1/8 which is not singled out. > >>> Now perform a linear interpolation and find the break-even point at >>> p=0.4, with p=0 for ordered data and p=1 for random data, or just use >>> the average of these numbers: 2.9 cycles vs. 2.75 cycles. >>> That's small, but measurable! I recommend you ALL start to read Daniel Lemire's article <https://www.infoq.com/articles/making-code-faster-taming-branches/> | Even when it is impossible to remove all branches, reducing the number | of branches "almost always taken" or "almost never taken" may help the | processor better predict the remaining branches. JFTR: I didn't know his article before, but I hope that you are willing to learn. Stefan