https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115025

--- Comment #5 from Haochen Jiang <haochen.jiang at intel dot com> ---
My guess is that for the prime judging loop:

        for (i = 5; i < max; i += 6)
                if ((n % i == 0) || (n % (i + 2) == 0))
                        return 0;

In GCC13, it extracts the first loop, which is (n % 5 == 0) || (n % 7 == 0),
out of the whole loop to do imul+cmp instead of div.

However, on current trunk, it still remains div and will be slower.

BTW, there is also a codegen regression which won't cause perf regression. On
current trunk, the sqrt BB is not merged together. It increases codesize but no
perf impact.

Reply via email to