This adjusts Zen AVX256 vector load cost to be twice as expensive than AVX128 vector load cost (twice via ix86_vec_cost keying on TARGET_AVX128_OPTIMAL - should rather use some TARGET_VECTOR_IMPL_WIDTH or so).
Likely the current cost value was meant to make AVX256 loads _cheaper_ than two AVX128 ones in which case we'd have to use 5 here. Honza, what was the intent here? Maybe not call ix86_vec_cost for loads or stores at all (given we model different sizes explicitely?)? The odd thing is that in some places you pass ix86_vec_cost COSNTS_N_INSNS (ix86_cost->...) and in others just ix86_cost->.... That's probably caused by most entries in the cost tables being scaled via COSTS_N_INSNS but not all which makes them not easily comparable... (I found a comment that says "We assume COSTS_N_INSNS is defined as (N)*4") That ix86_vec_cost adds a factor of two for AVX256 also is throwing throughput into the mix because latency-wise AVX256 behaves the same as AVX128 AFAICs. That is, the modeling isn't very precise... (multiplying by two still makes sense IMHO - but for example stores have only one 128bit port so those are the ones that should be more than factor-of-two pessimized if any) Anyhow, the patch removes one oddity comparing costs of vectorized loops when there are no lane-crossing operations (after the patch such loops will cost the same with AVX128 and AVX256 even though Zens frontend will benefit from using AVX256). Bootstrap & regtest running on x86_64-unknown-linux-gnu. OK? Thanks, Richard. 2018-10-08 Richard Biener <rguent...@suse.de> * config/i386/x86-tune-costs.h (znver1_cost): Make AVX256 vector loads cost the same as AVX128 ones. diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h index 71a5854c09a..c7f3945d72c 100644 --- a/gcc/config/i386/x86-tune-costs.h +++ b/gcc/config/i386/x86-tune-costs.h @@ -1518,9 +1518,9 @@ struct processor_costs znver1_cost = { {8, 8}, /* cost of storing MMX registers in SImode and DImode. */ 2, 3, 6, /* cost of moving XMM,YMM,ZMM register. */ - {6, 6, 6, 10, 20}, /* cost of loading SSE registers + {6, 6, 6, 6, 12}, /* cost of loading SSE registers in 32,64,128,256 and 512-bit. */ - {6, 6, 6, 10, 20}, /* cost of unaligned loads. */ + {6, 6, 6, 6, 12}, /* cost of unaligned loads. */ {8, 8, 8, 8, 16}, /* cost of storing SSE registers in 32,64,128,256 and 512-bit. */ {8, 8, 8, 8, 16}, /* cost of unaligned stores. */