Hi, I ran some testing on the soon-to-be-committed matmul patch. Specifically, I tried out what putting -march=native into libgfortran's Makefile.
Here is the performance data with the new code without -march. The interesting numbers are the ones for Matml fixed explicit, for size>=32. ========================================================= ================ MEASURED GIGAFLOPS = ========================================================= Matmul Matmul fixed Matmul variable Size Loops explicit refMatmul assumed explicit ========================================================= 2 5000 0.067 0.079 0.053 0.064 4 5000 0.440 0.444 0.364 0.434 8 5000 1.405 1.152 1.368 1.495 16 5000 2.805 1.885 3.172 3.444 32 5000 4.943 3.627 7.267 7.510 64 5000 9.037 4.028 9.036 9.157 128 3829 10.181 4.452 9.932 10.333 256 477 10.398 4.720 10.919 11.158 512 59 11.173 4.853 11.172 11.356 1024 7 11.074 3.616 11.075 11.266 With -march=native: ========================================================= ================ MEASURED GIGAFLOPS = ========================================================= Matmul Matmul fixed Matmul variable Size Loops explicit refMatmul assumed explicit ========================================================= 2 5000 0.064 0.080 0.051 0.064 4 5000 0.406 0.450 0.347 0.407 8 5000 1.342 1.124 1.364 1.437 16 5000 2.989 1.865 3.427 3.760 32 5000 5.543 3.481 8.203 8.700 64 5000 11.632 4.021 11.647 11.729 128 3829 13.968 4.372 13.966 14.046 256 477 15.778 4.717 15.780 15.761 512 59 16.102 4.855 16.075 16.109 1024 7 15.867 3.596 15.884 15.886 So, there could be quite some gain in performance if this could be exploited; even more for architectures like AVX-512, I suspect. Do you think this is worth pursuing? If so, how could/should this be implemented= Does anybody do this kind of thing in gcc yet? Regards Thomas