On 17/08/12 16:06, Andrew Stubbs wrote: > On 17/08/12 15:47, Richard Earnshaw wrote: >> If we don't have a 16x16->64 mult operation then after step 1 we'll >> still have a MULT_EXPR, not a WIDEN_MULT_EXPR, so when we reach step2 >> there's nothing to short circuit. >> >> Unless, of course, you're expecting us to get >> >> step1 -> 16x16->32 widen mult >> step2 -> widen64(step1) + acc64 > > No, given a u16xu16->u64 operation in the code, and that the arch > doesn't have such an opcode, I'd expect to get > > step1 -> (u32)u16 x (u32)u16 -> u64
Hmm, I would have thought that would be more costly than (u64)(u16 x u16 -> u32) > > Likewise, 8x8->32 might give (16)8x(16)8->32. > > The code can't see that the widening operation is non-optimal without > looking beyond into its inputs. Ok, in which case we have to give is_widening_mult_rhs_p enough smarts to not strip (s32)u32 and return u32. I'll have another think about it. R.