Below are two benchmarks that explore maximum floating point performance. loopm6 is double precision floating point and loopm6fp is parallell single-precision. They are manually unrolled multiply-add loops.
I used to reach 2.8 and 11 GFlops on these. Now I only get 2 and 6. If you explore the inner loop with gcc -O2 -S you can see that it seems to use few registers. If you run them, there is a parameter expected. I use 30000 - 50000. gcc 4.9.2-3 on 64-bit. I use gcc -O2.
loopm6.c
Description: Binary data
loopm6fp.c
Description: Binary data
timers.h
Description: Binary data
-- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple