On Tue, Nov 15, 2016 at 1:38 PM, Thomas Koenig <tkoe...@netcologne.de> wrote: > Hi, > > I ran some testing on the soon-to-be-committed matmul patch. > Specifically, I tried out what putting -march=native into > libgfortran's Makefile. > > Here is the performance data with the new code without -march. > The interesting numbers are the ones for Matml fixed explicit, > for size>=32. > > ========================================================= > ================ MEASURED GIGAFLOPS = > ========================================================= > Matmul Matmul > fixed Matmul variable > Size Loops explicit refMatmul assumed explicit > ========================================================= > 2 5000 0.067 0.079 0.053 0.064 > 4 5000 0.440 0.444 0.364 0.434 > 8 5000 1.405 1.152 1.368 1.495 > 16 5000 2.805 1.885 3.172 3.444 > 32 5000 4.943 3.627 7.267 7.510 > 64 5000 9.037 4.028 9.036 9.157 > 128 3829 10.181 4.452 9.932 10.333 > 256 477 10.398 4.720 10.919 11.158 > 512 59 11.173 4.853 11.172 11.356 > 1024 7 11.074 3.616 11.075 11.266 > > With -march=native: > > ========================================================= > ================ MEASURED GIGAFLOPS = > ========================================================= > Matmul Matmul > fixed Matmul variable > Size Loops explicit refMatmul assumed explicit > ========================================================= > 2 5000 0.064 0.080 0.051 0.064 > 4 5000 0.406 0.450 0.347 0.407 > 8 5000 1.342 1.124 1.364 1.437 > 16 5000 2.989 1.865 3.427 3.760 > 32 5000 5.543 3.481 8.203 8.700 > 64 5000 11.632 4.021 11.647 11.729 > 128 3829 13.968 4.372 13.966 14.046 > 256 477 15.778 4.717 15.780 15.761 > 512 59 16.102 4.855 16.075 16.109 > 1024 7 15.867 3.596 15.884 15.886 > > So, there could be quite some gain in performance if this > could be exploited; even more for architectures like AVX-512, > I suspect. > > Do you think this is worth pursuing? If so, how could/should this > be implemented= Does anybody do this kind of thing in gcc yet?
Check out https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html , in particular the target_clones attribute. Though I suspect you will have a hard time with all kinds of more or less obscure targets.. Maybe just do it for x86-64? -- Janne Blomqvist