https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77689
--- Comment #17 from Jan Hubicka <hubicka at gcc dot gnu.org> --- I posted the patch. With it we split the loop, but we don't get really big improvements from that h@ryzen3:~/gcc/build3/gcc> ./xgcc -B ./ -Ofast c.ii -S -fopt-info 2>&1 | grep split ; perf stat ./a.out c.C:15:9: optimized: loop split Performance counter stats for './a.out': 862.32 msec task-clock:u # 0.978 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 53,443 page-faults:u # 61.976 K/sec 3,295,805,448 cycles:u # 3.822 GHz (83.44%) 81,606,129 stalled-cycles-frontend:u # 2.48% frontend cycles idle (83.33%) 8,205,437 stalled-cycles-backend:u # 0.25% backend cycles idle (83.32%) 7,420,801,599 instructions:u # 2.25 insn per cycle # 0.01 stalled cycles per insn (82.88%) 903,367,479 branches:u # 1.048 G/sec (83.50%) 54,872 branch-misses:u # 0.01% of all branches (83.53%) 0.881716607 seconds time elapsed 0.782798000 seconds user 0.079877000 seconds sys jh@ryzen3:~/gcc/build3/gcc> ~/trunk-install/bin/g++ -Ofast c.ii -S -fopt-info 2>&1 | grep split ; perf stat ./a.out Performance counter stats for './a.out': 905.76 msec task-clock:u # 0.998 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 51,910 page-faults:u # 57.311 K/sec 3,459,244,533 cycles:u # 3.819 GHz (83.24%) 83,603,137 stalled-cycles-frontend:u # 2.42% frontend cycles idle (83.24%) 13,908,621 stalled-cycles-backend:u # 0.40% backend cycles idle (83.25%) 7,422,922,864 instructions:u # 2.15 insn per cycle # 0.01 stalled cycles per insn (83.30%) 899,226,266 branches:u # 992.791 M/sec (83.67%) 52,719 branch-misses:u # 0.01% of all branches (83.31%) 0.907459830 seconds time elapsed 0.810481000 seconds user 0.095820000 seconds sys optimized dump is: <bb 6> [local count: 679982665]: # ivtmp.62_101 = PHI <1(5), ivtmp.62_16(6)> _122 = MEM[(value_type &)_44 + ivtmp.62_101 * 4]; _120 = MEM[(value_type &)_41 + ivtmp.62_101 * 4]; _119 = _120 + _122; _114 = MEM[(value_type &)_41 + 18446744073709551612 + ivtmp.62_101 * 4]; _113 = _114 * _119; _112 = (double) _113; _111 = (signed int) ivtmp.62_101; _110 = (double) _111; _109 = __builtin_log (_110); _108 = _109 * _112; _107 = (float) _108; MEM[(value_type &)_35 + ivtmp.62_101 * 4] = _107; ivtmp.62_16 = ivtmp.62_101 + 1; if (ivtmp.62_16 != 100000000) goto <bb 6>; [98.42%] else goto <bb 7>; [1.58%] which looks reasonable. Vectorizer says c.ii.174t.vect:c.C:18:26: missed: versioning for alias required: can't determine dependence between *_123 and *_106 c.ii.174t.vect:c.C:18:26: missed: versioning for alias required: can't determine dependence between *_121 and *_106 c.ii.174t.vect:c.C:18:34: missed: versioning for alias required: can't determine dependence between *_115 and *_106 c.ii.174t.vect:c.C:13:27: missed: Unknown misalignment, naturally aligned c.ii.174t.vect:c.C:13:27: missed: Unknown misalignment, naturally aligned c.ii.174t.vect:c.C:13:27: missed: Unknown misalignment, naturally aligned c.ii.174t.vect:c.C:13:27: missed: Unknown misalignment, naturally aligned c.ii.174t.vect:c.C:13:27: missed: conversion not supported by target. c.ii.174t.vect:c.C:13:27: missed: no optab. c.ii.174t.vect:c.C:13:27: missed: function is not vectorizable. c.ii.174t.vect:/home/jh/trunk-install/include/c++/14.0.0/cmath:353:27: missed: not vectorized: relevant stmt not supported: _109 = __builtin_log (_110); c.ii.174t.vect:c.C:13:27: missed: bad operation or unsupported loop bound. c.ii.174t.vect:c.C:18:26: missed: versioning for alias required: can't determine dependence between *_123 and *_106 c.ii.174t.vect:c.C:18:26: missed: versioning for alias required: can't determine dependence between *_121 and *_106 c.ii.174t.vect:c.C:18:34: missed: versioning for alias required: can't determine dependence between *_115 and *_106 c.ii.174t.vect:c.C:13:27: missed: Unknown misalignment, naturally aligned c.ii.174t.vect:c.C:13:27: missed: Unknown misalignment, naturally aligned c.ii.174t.vect:c.C:13:27: missed: Unknown misalignment, naturally aligned c.ii.174t.vect:c.C:13:27: missed: Unknown misalignment, naturally aligned c.ii.174t.vect:c.C:13:27: missed: conversion not supported by target. c.ii.174t.vect:c.C:13:27: missed: no optab. c.ii.174t.vect:c.C:13:27: missed: function is not vectorizable. c.ii.174t.vect:/home/jh/trunk-install/include/c++/14.0.0/cmath:353:27: missed: not vectorized: relevant stmt not supported: _109 = __builtin_log (_110); c.ii.174t.vect:c.C:13:27: missed: bad operation or unsupported loop bound. c.ii.174t.vect:c.C:18:26: missed: versioning for alias required: can't determine dependence between *_123 and *_106 c.ii.174t.vect:c.C:18:26: missed: versioning for alias required: can't determine dependence between *_121 and *_106 c.ii.174t.vect:c.C:18:34: missed: versioning for alias required: can't determine dependence between *_115 and *_106 c.ii.174t.vect:c.C:18:34: missed: not vectorized: unsupported data-type double c.ii.174t.vect:c.C:13:27: missed: can't determine vectorization factor. c.ii.174t.vect:c.C:13:27: missed: not vectorized: no vectype for stmt: _122 = *_123; c.ii.174t.vect:c.C:18:26: missed: not vectorized: no vectype for stmt: _122 = *_123; c.ii.174t.vect:c.C:13:27: missed: bad data references. c.ii.174t.vect:c.C:13:27: missed: couldn't vectorize loop At loop vectorization time the counter goes backwards: <bb 36> [local count: 679982665]: # i_127 = PHI <1(7), i_17(40)> # ivtmp_31 = PHI <99999999(7), ivtmp_28(40)> _125 = (long unsigned int) i_127; _124 = _125 * 4; _123 = _44 + _124; _122 = *_123; _121 = _41 + _124; _120 = *_121; _119 = _122 + _120; _118 = i_127 + 4294967295; _117 = (long unsigned int) _118; _116 = _117 * 4; _115 = _41 + _116; _114 = *_115; _113 = _119 * _114; _112 = (double) _113; _111 = (signed int) i_127; _110 = (double) _111; _109 = __builtin_log (_110); _108 = _112 * _109; _107 = (float) _108; _106 = _35 + _124; *_106 = _107; i_17 = i_127 + 1; ivtmp_28 = ivtmp_31 - 1; if (ivtmp_28 != 0) goto <bb 40>; [98.42%] else goto <bb 12>; [1.58%] <bb 40> [local count: 669250617]: goto <bb 36>; [100.00%]