> Hi, > this patch enables logic which avoid FMA for matrix multiplicaiton loop > for 256 bit vectors. The underlying issue is same as with znver1. While > combined latency of mutliply and add operations is slower than FMA, the > dependency chain in matrix multiplication depends only on additions > that are faster. > > Bootstrapped/regtested x86_64-linux, comitted. > > * config/i386/i386-options.c (ix86_option_override_internal): Default > PARAM_AVOID_FMA_MAX_BITS to 256 for znver2. > * conifg/i386/x86-tune.def (X86_TUNE_AVOID_256FMA_CHAINS): Set for > ZNVER2.
Hi, this patch is now also backported to gcc9 branch (r273901) Honza