https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102684
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Here is what optimized looks like on the trunk (after the main->f change): <bb 2> [local count: 1073741824]: _54 = operator new (16); MEM <uint128_t> [(char * {ref-all})_54] = 0xa000000060000000400000002; <bb 3> [local count: 3043903040]: # __m_103 = PHI <1(2), __m_85(6)> # __n_104 = PHI <5(2), __n_111(6)> if (__m_103 > __n_104) goto <bb 7>; [50.00%] else goto <bb 4>; [50.00%] <bb 4> [local count: 1521951520]: __n_107 = __n_104 - __m_103; if (__n_107 == 0) goto <bb 5>; [22.00%] else goto <bb 6>; [78.00%] <bb 5> [local count: 1014634349]: _109 = __m_103 << 1; _41 = (int) _109; operator delete (_54, 16); return _41; <bb 6> [local count: 2709073704]: # __m_85 = PHI <__m_103(4), __n_104(7)> # __n_30 = PHI <__n_107(4), __n_87(7)> _110 = __builtin_ctz (__n_30); __n_111 = __n_30 >> _110; goto <bb 3>; [100.00%] <bb 7> [local count: 1521951520]: __n_87 = __m_103 - __n_104; goto <bb 6>; [100.00%] There seems to be some VRP missing inside the loop.