On Thu, Nov 26, 2015 at 09:50:50AM +0000, Kyrill Tkachov wrote: > As I mentioned on IRC, this patch improves codegen on aarch64 as well. > I've re-checked SPEC2006 and it seems to improve codegen around > multiply-extend-accumulate > instructions. For example the sequence: > mov w4, 64 > mov x1, 24 > smaddl x1, w9, w4, x1 // multiply-sign-extend-accumulate > add x1, x3, x1 > > becomes something like this: > mov w3, 64 > smaddl x1, w9, w3, x0 > add x1, x1, 24 // constant 24 propagated into the add
So combine isn't smart enough to combine those last three into those last two. Yeah that makes sense. > Another was transforming the muliply-extend into something cheaper: > mov x0, 40 > mov w22, 32 > umaddl x22, w21, w22, x0 // multiply-zero-extend-accumulate > > changed becomes: > ubfiz x22, x21, 5, 32 // ASHIFT+extend > add x22, x22, 40 > > which should be always beneficial. But it only applies given some other preconditions. right? Either case, make sense that one is also too complicated for combine. > From what I can see we don't lose any of the multiply-extend-accumulate > opportunities that we gained from the original combine patch. > > So can we take this patch in as well? See the patch mail... Segher