[PATCH][i386] Update Zen tuning for vector load cost

Richard Biener Mon, 08 Oct 2018 03:55:31 -0700


This adjusts Zen AVX256 vector load cost to be twice as expensive
than AVX128 vector load cost (twice via ix86_vec_cost keying on
TARGET_AVX128_OPTIMAL - should rather use some
TARGET_VECTOR_IMPL_WIDTH or so).


Likely the current cost value was meant to make AVX256 loads _cheaper_
than two AVX128 ones in which case we'd have to use 5 here.  Honza,
what was the intent here?  Maybe not call ix86_vec_cost for loads or
stores at all (given we model different sizes explicitely?)?  The
odd thing is that in some places you pass ix86_vec_cost
COSNTS_N_INSNS (ix86_cost->...) and in others just
ix86_cost->....  That's probably caused by most entries in the cost
tables being scaled via COSTS_N_INSNS but not all which makes them
not easily comparable... (I found a comment that says
"We assume COSTS_N_INSNS is defined as (N)*4")

That ix86_vec_cost adds a factor of two for AVX256 also is throwing
throughput into the mix because latency-wise AVX256 behaves the
same as AVX128 AFAICs.  That is, the modeling isn't very precise...
(multiplying by two still makes sense IMHO - but for example stores
have only one 128bit port so those are the ones that should be
more than factor-of-two pessimized if any)

Anyhow, the patch removes one oddity comparing costs of vectorized
loops when there are no lane-crossing operations (after the patch
such loops will cost the same with AVX128 and AVX256 even though
Zens frontend will benefit from using AVX256).

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

2018-10-08  Richard Biener  <rguent...@suse.de>

        * config/i386/x86-tune-costs.h (znver1_cost): Make AVX256 vector loads
        cost the same as AVX128 ones.

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index 71a5854c09a..c7f3945d72c 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -1518,9 +1518,9 @@ struct processor_costs znver1_cost = {
   {8, 8},                              /* cost of storing MMX registers
                                           in SImode and DImode.  */
   2, 3, 6,                             /* cost of moving XMM,YMM,ZMM register. 
 */
-  {6, 6, 6, 10, 20},                   /* cost of loading SSE registers
+  {6, 6, 6, 6, 12},                    /* cost of loading SSE registers
                                           in 32,64,128,256 and 512-bit.  */
-  {6, 6, 6, 10, 20},                   /* cost of unaligned loads.  */
+  {6, 6, 6, 6, 12},                    /* cost of unaligned loads.  */
   {8, 8, 8, 8, 16},                    /* cost of storing SSE registers
                                           in 32,64,128,256 and 512-bit.  */
   {8, 8, 8, 8, 16},                    /* cost of unaligned stores.  */

[PATCH][i386] Update Zen tuning for vector load cost

Reply via email to