Hi, > You see that there is a great float for > IfExprLongColumnLongColumnBench.bench, the float is 583775 and the average > value is 1621602.
In my tests, the single core tests tended to have huge variations on Intel with Turbo boost. CPU operations which are fast when stressing CPU in single threaded mode tended to get really slow when the other cores spin up and hitting thermal limits. For most memory bound operations this is not easily visible, but the better pipelined and vectorized the loops get the worse the impact of dynamic CPU frequency scaling. Can you collect active CPU frequency when running this benchmark and do "taskset -c 1" to force the run to stick to a single CPU? Cheers, Gopal