> On Wed, Jun 14, 2017 at 1:14 PM, Prachi Godbole > <prachi.godb...@imgtec.com> wrote: > > I'm developing a solution to optimize away intermediate stores (loads) for > static local variables which are assigned to before referenced on every path > through a function. > > > > Currently GCC eliminates all loads/stores in a straight line code but not in > control structures. AFAIU, passes like sccvn, lim (store motion in particular) > sink etc. try to propagate the assignment exprs through SSA vars instead of > through memory thereby eliminating almost all loads and some stores but > still not all. > > > > For example: > > > > void some_function() > > { > > static some_type sum; > > ... > > for( i = 0 ; i < n ; i++ ) > > { > > sum = matrix [i][n] ; > > > > for( j = 0 ; j < i ; j++ ) > > { > > sum -= matrix [i][j] * matrix [j][n] ; > > } > > > > matrix [i][n] = sum ; > > } > > ... > > } > > > > Resulting SSA: > > > > ... > > <bb 14> [2.25%]: > > _10 = matrix[0][n_13(D)]; > > sum = _10; > > goto <bb 10>; [100.00%] > > ... > > <bb 4> [12.75%]: > > _1 = matrix[i_16][n_13(D)]; > > if (i_16 > 0) > > goto <bb 6>; [100.00%] > > else > > goto <bb 9>; [0.00%] > > ... > > <bb 6> [85.00%]: > > # j_24 = PHI <j_18(6), 0(4)> > > # sum_lsm.4_27 = PHI <_6(6), _1(4)> > > _3 = matrix[i_16][j_24]; > > _4 = matrix[j_24][n_13(D)]; > > _5 = _3 * _4; > > _6 = sum_lsm.4_27 - _5; > > j_18 = j_24 + 1; > > if (i_16 > j_18) > > goto <bb 6>; [85.00%] > > else > > goto <bb 7>; [15.00%] > > ... > > <bb 7> [12.75%]: > > # _40 = PHI <_6(6)> > > goto <bb 9>; [100.00%] > > > > <bb 9> [12.75%]: > > # sum_lsm.4_19 = PHI <_40(7), _1(4)> > > > > <bb 10> [15.00%]: > > # i_20 = PHI <i_16(9), 0(14)> > > # sum_lsm.4_8 = PHI <sum_lsm.4_19(9), _10(14)> > > # sum_lsm.5_22 = PHI <1(9), 0(14)> > > matrix[i_20][n_13(D)] = sum_lsm.4_8; > > i_16 = i_20 + 1; > > if (n_13(D) > i_16) > > goto <bb 4>; [85.00%] > > else > > goto <bb 11>; [15.00%] > > > > <bb 11> [2.25%]: > > # sum_lsm.4_39 = PHI <sum_lsm.4_8(10)> > > # sum_lsm.5_38 = PHI <sum_lsm.5_22(10)> > > if (sum_lsm.5_38 != 0) > > goto <bb 12>; [0.00%] > > else > > goto <bb 13>; [100.00%] > > > > <bb 12> [0.00%]: > > sum = sum_lsm.4_39; > > ... > > > > Although all intermediate loads are eliminated here, store to sum before > and after the 'j' loop (see bb 14 and bb 12) still remain which can be > eliminated. > > > > My solution would reuse a lot of data structures and routines from the sink > pass. So my question is, can I add this optimization into the sink pass > itself or > would it require a new pass and if yes, what would be the ideal place for it. > > Any other pointers are more than welcome. > > So the idea is that for any store to such variable that is dominated by a > kill the > function exit doesn't count as a use, right? > (unless the variable escapes the scope of the function, for example via a > nested function -- which might be the case that is very difficult to detect). > Yes that is the general idea. I had thought of initially being conservative and not optimizing at all when the variable could potentially escape the scope.
> I'm not sure what part of the sink pass is suitable for this but as sinking > walks > post dominators backwards which also DSE does it would fit DSE as well, no? > You'd pre-compute blocks containing kills (or do two backwards passes) and > then special case GIMPLE_RETURN in dse_classify_store where previously > ref_maybe_used_by_stmt_p covers those. > I was looking for a one time solution rather than doing it as and when the opportunities present themselves. But doing it in DSE is more advantageous, I can see that. So I'll go ahead with DSE. > nested fn case: > > void foo (void) > { > static int i = 0; > int __attribute__((noinline)) bar () { return i; } > int j = bar (); > i = 1; > } > > I think we don't mark 'i' TREE_ADDRESSABLE (escaped) here so we simply > treat 'i' as a global variable accessible from other scopes. > > Richard. > > > Thanks, > > Prachi