https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83017

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ok, so we do slightly better for the runtime test than for the static test:

      if (loop->inner)
        m_p_thread=2;
      else
        m_p_thread=MIN_PER_THREAD;

so with 2 threads we should have exactly 2 iterations but ... the runtime
check uses the number of latch executions which is 3 and thus arrives at
1 iteration per thread.  Fixing this off-by-one get's us

> /usr/bin/time ./a.out 
 PI   2.98876095    
 PI   3.14159274    
4.02user 0.00system 0:04.02elapsed 99%CPU (0avgtext+0avgdata 2460maxresident)k
0inputs+0outputs (0major+102minor)pagefaults 0swaps

vs.

> /usr/bin/time ./a.out 
 PI   8.59536934    
 PI   3.14159274    
10.90user 0.00system 0:05.54elapsed 196%CPU (0avgtext+0avgdata
2840maxresident)k
0inputs+0outputs (0major+126minor)pagefaults 0swaps


I guess the different computation outcome means we're doing sth wrong
somewhere.
Also at least on my machine the result isn't any faster (when parallelizing
the outer loop).  As usual auto-parallelization may harm followup transforms.

Reply via email to