On 10/21/2013 03:01 PM, Yufeng Zhang wrote: > > This patch changes the widening_mul pass to fuse the widening multiply with > accumulate only when the multiply has single use. The widening_mul pass > currently does the conversion regardless of the number of the uses, which can > cause poor code-gen in cases like the following: > > typedef int ArrT [10][10]; > > void > foo (ArrT Arr, int Idx) > { > Arr[Idx][Idx] = 1; > Arr[Idx + 10][Idx] = 2; > } > > On AArch64, after widening_mul, the IR is like > > _2 = (long unsigned int) Idx_1(D); > _3 = Idx_1(D) w* 40; <---- > _5 = Arr_4(D) + _3; > *_5[Idx_1(D)] = 1; > _8 = WIDEN_MULT_PLUS_EXPR <Idx_1(D), 40, 400>; <---- > _9 = Arr_4(D) + _8; > *_9[Idx_1(D)] = 2; > > Where the arrows point, there are redundant widening multiplies.
So they're redundant. Why does this imply poor code-gen? If a target has more than one FMA unit, then the target might be able to issue the computation for _3 and _8 in parallel. Even if the target only has one FMA unit, but the unit is pipelined, the computations could overlap. r~