On Tue, Nov 15, 2016 at 1:38 PM, Thomas Koenig <tkoe...@netcologne.de> wrote:
> Hi,
>
> I ran some testing on the soon-to-be-committed matmul patch.
> Specifically, I tried out what putting -march=native into
> libgfortran's Makefile.
>
> Here is the performance data with the new code without -march.
> The interesting numbers are the ones for Matml fixed explicit,
> for size>=32.
>
>  =========================================================
>  ================            MEASURED GIGAFLOPS          =
>  =========================================================
>                  Matmul                           Matmul
>                  fixed                 Matmul     variable
>  Size  Loops     explicit   refMatmul  assumed    explicit
>  =========================================================
>     2  5000      0.067      0.079      0.053      0.064
>     4  5000      0.440      0.444      0.364      0.434
>     8  5000      1.405      1.152      1.368      1.495
>    16  5000      2.805      1.885      3.172      3.444
>    32  5000      4.943      3.627      7.267      7.510
>    64  5000      9.037      4.028      9.036      9.157
>   128  3829     10.181      4.452      9.932     10.333
>   256   477     10.398      4.720     10.919     11.158
>   512    59     11.173      4.853     11.172     11.356
>  1024     7     11.074      3.616     11.075     11.266
>
> With -march=native:
>
>  =========================================================
>  ================            MEASURED GIGAFLOPS          =
>  =========================================================
>                  Matmul                           Matmul
>                  fixed                 Matmul     variable
>  Size  Loops     explicit   refMatmul  assumed    explicit
>  =========================================================
>     2  5000      0.064      0.080      0.051      0.064
>     4  5000      0.406      0.450      0.347      0.407
>     8  5000      1.342      1.124      1.364      1.437
>    16  5000      2.989      1.865      3.427      3.760
>    32  5000      5.543      3.481      8.203      8.700
>    64  5000     11.632      4.021     11.647     11.729
>   128  3829     13.968      4.372     13.966     14.046
>   256   477     15.778      4.717     15.780     15.761
>   512    59     16.102      4.855     16.075     16.109
>  1024     7     15.867      3.596     15.884     15.886
>
> So, there could be quite some gain in performance if this
> could be exploited; even more for architectures like AVX-512,
> I suspect.
>
> Do you think this is worth pursuing?  If so, how could/should this
> be implemented=  Does anybody do this kind of thing in gcc yet?

Check out https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
, in particular the target_clones attribute.

Though I suspect you will have a hard time with all kinds of more or
less obscure targets.. Maybe just do it for x86-64?



-- 
Janne Blomqvist

Reply via email to