https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77689

--- Comment #17 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
I posted the patch.  With it we split the loop, but we don't get really big
improvements from that
h@ryzen3:~/gcc/build3/gcc> ./xgcc -B ./ -Ofast c.ii -S -fopt-info 2>&1 | grep
split ; perf stat ./a.out
c.C:15:9: optimized: loop split

 Performance counter stats for './a.out':

            862.32 msec task-clock:u                     #    0.978 CPUs
utilized             
                 0      context-switches:u               #    0.000 /sec        
                 0      cpu-migrations:u                 #    0.000 /sec        
            53,443      page-faults:u                    #   61.976 K/sec       
     3,295,805,448      cycles:u                         #    3.822 GHz        
                (83.44%)
        81,606,129      stalled-cycles-frontend:u        #    2.48% frontend
cycles idle        (83.33%)
         8,205,437      stalled-cycles-backend:u         #    0.25% backend
cycles idle         (83.32%)
     7,420,801,599      instructions:u                   #    2.25  insn per
cycle            
                                                  #    0.01  stalled cycles per
insn     (82.88%)
       903,367,479      branches:u                       #    1.048 G/sec      
                (83.50%)
            54,872      branch-misses:u                  #    0.01% of all
branches             (83.53%)

       0.881716607 seconds time elapsed

       0.782798000 seconds user
       0.079877000 seconds sys


jh@ryzen3:~/gcc/build3/gcc> ~/trunk-install/bin/g++ -Ofast c.ii -S -fopt-info
2>&1 | grep split ; perf stat ./a.out

 Performance counter stats for './a.out':

            905.76 msec task-clock:u                     #    0.998 CPUs
utilized             
                 0      context-switches:u               #    0.000 /sec        
                 0      cpu-migrations:u                 #    0.000 /sec        
            51,910      page-faults:u                    #   57.311 K/sec       
     3,459,244,533      cycles:u                         #    3.819 GHz        
                (83.24%)
        83,603,137      stalled-cycles-frontend:u        #    2.42% frontend
cycles idle        (83.24%)
        13,908,621      stalled-cycles-backend:u         #    0.40% backend
cycles idle         (83.25%)
     7,422,922,864      instructions:u                   #    2.15  insn per
cycle            
                                                  #    0.01  stalled cycles per
insn     (83.30%)
       899,226,266      branches:u                       #  992.791 M/sec      
                (83.67%)
            52,719      branch-misses:u                  #    0.01% of all
branches             (83.31%)

       0.907459830 seconds time elapsed

       0.810481000 seconds user
       0.095820000 seconds sys

optimized dump is:
  <bb 6> [local count: 679982665]:
  # ivtmp.62_101 = PHI <1(5), ivtmp.62_16(6)>
  _122 = MEM[(value_type &)_44 + ivtmp.62_101 * 4];
  _120 = MEM[(value_type &)_41 + ivtmp.62_101 * 4];
  _119 = _120 + _122;
  _114 = MEM[(value_type &)_41 + 18446744073709551612 + ivtmp.62_101 * 4];
  _113 = _114 * _119;
  _112 = (double) _113;
  _111 = (signed int) ivtmp.62_101;
  _110 = (double) _111;
  _109 = __builtin_log (_110);
  _108 = _109 * _112;
  _107 = (float) _108;
  MEM[(value_type &)_35 + ivtmp.62_101 * 4] = _107;
  ivtmp.62_16 = ivtmp.62_101 + 1;
  if (ivtmp.62_16 != 100000000)
    goto <bb 6>; [98.42%]
  else
    goto <bb 7>; [1.58%]

which looks reasonable. Vectorizer says
c.ii.174t.vect:c.C:18:26: missed:   versioning for alias required: can't
determine dependence between *_123 and *_106
c.ii.174t.vect:c.C:18:26: missed:   versioning for alias required: can't
determine dependence between *_121 and *_106
c.ii.174t.vect:c.C:18:34: missed:   versioning for alias required: can't
determine dependence between *_115 and *_106
c.ii.174t.vect:c.C:13:27: missed:   Unknown misalignment, naturally aligned
c.ii.174t.vect:c.C:13:27: missed:   Unknown misalignment, naturally aligned
c.ii.174t.vect:c.C:13:27: missed:   Unknown misalignment, naturally aligned
c.ii.174t.vect:c.C:13:27: missed:   Unknown misalignment, naturally aligned
c.ii.174t.vect:c.C:13:27: missed:   conversion not supported by target.
c.ii.174t.vect:c.C:13:27: missed:   no optab.
c.ii.174t.vect:c.C:13:27: missed:   function is not vectorizable.
c.ii.174t.vect:/home/jh/trunk-install/include/c++/14.0.0/cmath:353:27: missed: 
 not vectorized: relevant stmt not supported: _109 = __builtin_log (_110);
c.ii.174t.vect:c.C:13:27: missed:  bad operation or unsupported loop bound.
c.ii.174t.vect:c.C:18:26: missed:   versioning for alias required: can't
determine dependence between *_123 and *_106
c.ii.174t.vect:c.C:18:26: missed:   versioning for alias required: can't
determine dependence between *_121 and *_106
c.ii.174t.vect:c.C:18:34: missed:   versioning for alias required: can't
determine dependence between *_115 and *_106
c.ii.174t.vect:c.C:13:27: missed:   Unknown misalignment, naturally aligned
c.ii.174t.vect:c.C:13:27: missed:   Unknown misalignment, naturally aligned
c.ii.174t.vect:c.C:13:27: missed:   Unknown misalignment, naturally aligned
c.ii.174t.vect:c.C:13:27: missed:   Unknown misalignment, naturally aligned
c.ii.174t.vect:c.C:13:27: missed:   conversion not supported by target.
c.ii.174t.vect:c.C:13:27: missed:   no optab.
c.ii.174t.vect:c.C:13:27: missed:   function is not vectorizable.
c.ii.174t.vect:/home/jh/trunk-install/include/c++/14.0.0/cmath:353:27: missed: 
 not vectorized: relevant stmt not supported: _109 = __builtin_log (_110);
c.ii.174t.vect:c.C:13:27: missed:  bad operation or unsupported loop bound.
c.ii.174t.vect:c.C:18:26: missed:   versioning for alias required: can't
determine dependence between *_123 and *_106
c.ii.174t.vect:c.C:18:26: missed:   versioning for alias required: can't
determine dependence between *_121 and *_106
c.ii.174t.vect:c.C:18:34: missed:   versioning for alias required: can't
determine dependence between *_115 and *_106
c.ii.174t.vect:c.C:18:34: missed:   not vectorized: unsupported data-type
double
c.ii.174t.vect:c.C:13:27: missed:  can't determine vectorization factor.
c.ii.174t.vect:c.C:13:27: missed:   not vectorized: no vectype for stmt: _122 =
*_123;
c.ii.174t.vect:c.C:18:26: missed:   not vectorized: no vectype for stmt: _122 =
*_123;
c.ii.174t.vect:c.C:13:27: missed:  bad data references.
c.ii.174t.vect:c.C:13:27: missed: couldn't vectorize loop

At loop vectorization time the counter goes backwards:
  <bb 36> [local count: 679982665]:
  # i_127 = PHI <1(7), i_17(40)>
  # ivtmp_31 = PHI <99999999(7), ivtmp_28(40)>
  _125 = (long unsigned int) i_127;
  _124 = _125 * 4;
  _123 = _44 + _124;
  _122 = *_123;
  _121 = _41 + _124;
  _120 = *_121;
  _119 = _122 + _120;
  _118 = i_127 + 4294967295;
  _117 = (long unsigned int) _118;
  _116 = _117 * 4;
  _115 = _41 + _116;
  _114 = *_115;
  _113 = _119 * _114;
  _112 = (double) _113; 
  _111 = (signed int) i_127;
  _110 = (double) _111; 
  _109 = __builtin_log (_110);
  _108 = _112 * _109;
  _107 = (float) _108;
  _106 = _35 + _124;
  *_106 = _107;
  i_17 = i_127 + 1;
  ivtmp_28 = ivtmp_31 - 1;
  if (ivtmp_28 != 0)
    goto <bb 40>; [98.42%]
  else
    goto <bb 12>; [1.58%]

  <bb 40> [local count: 669250617]:
  goto <bb 36>; [100.00%]

Reply via email to