https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115025
--- Comment #5 from Haochen Jiang <haochen.jiang at intel dot com> --- My guess is that for the prime judging loop: for (i = 5; i < max; i += 6) if ((n % i == 0) || (n % (i + 2) == 0)) return 0; In GCC13, it extracts the first loop, which is (n % 5 == 0) || (n % 7 == 0), out of the whole loop to do imul+cmp instead of div. However, on current trunk, it still remains div and will be slower. BTW, there is also a codegen regression which won't cause perf regression. On current trunk, the sqrt BB is not merged together. It increases codesize but no perf impact.