On 10/21/2013 03:01 PM, Yufeng Zhang wrote:
> 
> This patch changes the widening_mul pass to fuse the widening multiply with
> accumulate only when the multiply has single use.  The widening_mul pass
> currently does the conversion regardless of the number of the uses, which can
> cause poor code-gen in cases like the following:
> 
> typedef int ArrT [10][10];
> 
> void
> foo (ArrT Arr, int Idx)
> {
>   Arr[Idx][Idx] = 1;
>   Arr[Idx + 10][Idx] = 2;
> }
> 
> On AArch64, after widening_mul, the IR is like
> 
>   _2 = (long unsigned int) Idx_1(D);
>   _3 = Idx_1(D) w* 40;                           <----
>   _5 = Arr_4(D) + _3;
>   *_5[Idx_1(D)] = 1;
>   _8 = WIDEN_MULT_PLUS_EXPR <Idx_1(D), 40, 400>; <----
>   _9 = Arr_4(D) + _8;
>   *_9[Idx_1(D)] = 2;
> 
> Where the arrows point, there are redundant widening multiplies.

So they're redundant.  Why does this imply poor code-gen?

If a target has more than one FMA unit, then the target might
be able to issue the computation for _3 and _8 in parallel.

Even if the target only has one FMA unit, but the unit is
pipelined, the computations could overlap.


r~

Reply via email to