https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604

--- Comment #21 from amker at gcc dot gnu.org ---
(In reply to Alexander Nesterovskiy from comment #20)
> I've made test runs on Broadwell and Skylake, RHEL 7.3.
> 410.bwaves became faster after r256990 but not as fast as it was on r253678. 
> Comparing 410.bwaves performance, "-Ofast -funroll-loops -flto
> -ftree-parallelize-loops=4": 
> 
> rev       perf. relative to r253678, %
> r253678   100%
> r253679   54%
> ...
> r256989   54%
> r256990   71%
> 
> CPU time distribution became more flat (~34% thread0, ~22% - threads1-3),
> but a lot of time is spent spinning in 
> libgomp.so.1.0.0/gomp_barrier_wait_end -> do_wait -> do_spin
> and
> libgomp.so.1.0.0/gomp_team_barrier_wait_end -> do_wait -> do_spin 
> r253678 spin time is ~10% of CPU time 
> r256990 spin time is ~30% of CPU time

I don't know gomp. Does this mean we spend more time synchronizing threads now?
Thanks.

Reply via email to