Hi, On ThunderX 2, the logical shifts that are part of the address cause a 1 cycle extra and this is not modeled correctly. For induction variables we don't want to do the shift. When I change the cost for the shift of the addresses, I get a 12% improvement on HMMER; all over benchmarks in SPEC CPU 2006 were neutral.
OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions. Thanks, Andrew Pinski ChangeLog: * config/aarch64/aarch64.c (thunderx2t99_addrcost_table): Improve cost table.
Index: gcc/config/aarch64/aarch64.c =================================================================== --- gcc/config/aarch64/aarch64.c (revision 244839) +++ gcc/config/aarch64/aarch64.c (working copy) @@ -273,9 +273,9 @@ static const struct cpu_addrcost_table q static const struct cpu_addrcost_table thunderx2t99_addrcost_table = { { - 0, /* hi */ - 0, /* si */ - 0, /* di */ + 1, /* hi */ + 1, /* si */ + 1, /* di */ 2, /* ti */ }, 0, /* pre_modify */