On Fri, Dec 30, 2016 at 10:05:26PM -0800, Andrew Pinski wrote: > Hi, > Currently for the following function: > int f(int a, int b) > { > return a + (b <<7); > } > > GCC produces: > add w0, w0, w1, lsl 7 > But for ThunderX 1, it is better if the instruction was split allowing > better scheduling to happen in most cases, the latency is the same. I > get a small improvement in coremarks, ~1%. > > Currently the code does not take into account Arith_shift even though > the comment: > /* Strip any extend, leave shifts behind as we will > cost them through mult_cost. */ > Say it does not strip out the shift, aarch64_strip_extend does and has > always has since the back-end was added to GCC. > > Once I fixed the code around aarch64_strip_extend, I got a regression > for ThunderX 1 as some shifts/extends (left shifts <=4 and/or zero > extends) are considered free so I needed to add a new tuning flag. > > Note I will get an even more improvement for ThunderX 2 CN99XX, but I > have not measured it yet as I have not made the change to > aarch64-cost-tables.h yet as I am waiting for approval of the renaming > patch first before submitting any of the cost table changes. Also I > noticed this problem with this tuning first and then looked back at > what I needed to do for ThunderX 1. > > OK? Bootstrapped and tested on aarch64-linux-gnu without any > regressions (both with and without --with-cpu=thunderx).
This is mostly OK, but I don't like the name "easy"_shift_extend. Cheap or free seems better. I have some other minor points below. > Index: config/aarch64/aarch64-tuning-flags.def > =================================================================== > --- config/aarch64/aarch64-tuning-flags.def (revision 243974) > +++ config/aarch64/aarch64-tuning-flags.def (working copy) > @@ -35,4 +35,8 @@ two load/stores are not at least 8 byte > pairs. */ > AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", SLOW_UNALIGNED_LDPW) > > +/* Logical shift left <=4 with/without zero extend are considered easy > + extended, also zero extends without the shift. */ I'm struggling to parse this comment. "also zero extends without the shift" is what is getting me. I'm also not certain I follow when I should set this flag. If all shifts are cheap/free on my platform, should I set this flag? > +AARCH64_EXTRA_TUNING_OPTION ("easy_shift_extend", EASY_SHIFT_EXTEND) > + > #undef AARCH64_EXTRA_TUNING_OPTION > + > +/* Return true iff X is an easy shift without a sign extend. */ > + Again I don't like calling <= 4 "easy", it feels imprecise. Thanks, James