Re: [patch, libfortran] AMD-specific versions of library matmul

Jerry DeLisle Thu, 25 May 2017 07:06:59 -0700

On 05/25/2017 03:45 AM, Thomas Koenig wrote:

Hello world,


the attached patch speeds up the library version of matmul for AMD chips
by selecting AVX128 instructions and, depending on which instructions
are supported, either FMA3 (aka FMA) or FMA4.

Jerry tested this on his AMD systems, and found a speedup vs. the
current code of around 10%.

I have been unable to test this on a Ryzen system (the new compile farm
machines won't accept my login yet).  From the benchmarks I have read,
this method should also work fairly well on a Ryzen.

So, OK for trunk?


Yes, OK.  Maybe test Ryzen first?

I just confirmed access to the Ryzen machines so I plan to get set up and testthere.


Time to start looking under the hood.

cat /proc/cpuinfo gives for flags:

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lmconstant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pnipclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16crdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpextperfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smapclflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf arat npt lbrvsvm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilterpfthreshold avic overflow_recov succor smca

Re: [patch, libfortran] AMD-specific versions of library matmul

Reply via email to