Yufeng Zhang <yufeng.zh...@arm.com> wrote: >On 12/03/13 14:20, Richard Biener wrote: >> On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhang<yufeng.zh...@arm.com> >wrote: >>> On 12/03/13 06:48, Jeff Law wrote: >>>> >>>> On 12/02/13 08:47, Yufeng Zhang wrote: >>>>> >>>>> Ping~ >>>>> >>>>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html >>>> >>>> >>>>> >>>>> Thanks, >>>>> Yufeng >>>>> >>>>> On 11/26/13 15:02, Yufeng Zhang wrote: >>>>>> >>>>>> On 11/26/13 12:45, Richard Biener wrote: >>>>>>> >>>>>>> On Thu, Nov 14, 2013 at 12:25 AM, Yufeng >>>>>>> Zhang<yufeng.zh...@arm.com> wrote: >>>>>>>> >>>>>>>> On 11/13/13 20:54, Bill Schmidt wrote: >>>>>>>>> >>>>>>>>> The second version of your original patch is ok with me with >the >>>>>>>>> following changes. Sorry for the little side adventure into >the >>>>>>>>> next-interp logic; in the end that's going to hurt more than >it >>>>>>>>> helps in >>>>>>>>> this case. Thanks for having a look at it, anyway. Thanks >also for >>>>>>>>> cleaning up this version to be less intrusive to common >interfaces; I >>>>>>>>> appreciate it. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks a lot for the review. I've attached an updated patch >with the >>>>>>>> suggested changes incorporated. >>>>>>>> >>>>>>>> For the next-interp adventure, I was quite happy to do the >>>>>>>> experiment; it's >>>>>>>> a good chance of gaining insight into the pass. Many thanks >for >>>>>>>> your prompt >>>>>>>> replies and patience in guiding! >>>>>>>> >>>>>>>> >>>>>>>>> Everything else looks OK to me. Please ask Richard for final >>>>>>>>> approval, >>>>>>>>> as I'm not a maintainer. >>>> >>>> First a note, I need to check on voting for Bill as the slsr >maintainer >>>> from the steering committee. Voting was in progress just before >the >>>> close of stage1 development so I haven't tallied the results :-) >>> >>> >>> Looking forward to some good news! :) >>> >>> >>>>>> >>>>>> Yes, you are right about the non-trivial 'base' tree are rarely >shared. >>>>>> The cached is introduced mainly because get_alternative_base >() may >>>>>> be >>>>>> called twice on the same 'base' tree, once in the >>>>>> find_basis_for_candidate () for look-up and the other time in >>>>>> alloc_cand_and_find_basis () for record_potential_basis (). I'm >happy >>>>>> to leave out the cache if you think the benefit is trivial. >>>> >>>> Without some sense of how expensive the lookups are vs how often >the >>>> cache hits it's awful hard to know if the cache is worth it. >>>> >>>> I'd say take it out unless you have some sense it's really saving >time. >>>> It's a pretty minor implementation detail either way. >>> >>> >>> I think the affine tree routines are generally expensive; it is >worth having >>> a cache to avoid calling them too many times. I run the slsr-*.c >tests >>> under gcc.dg/tree-ssa/ and find out that the cache hit rates range >from >>> 55.6% to 90%, with 73.5% as the average. The samples may not well >represent >>> the real world scenario, but they do show the fact that the 'base' >tree can >>> be shared to some extent. So I'd like to have the cache in the >patch. >>> >>> >>>> >>>>>> >>>>>>> +/* { dg-do compile } */ >>>>>>> +/* { dg-options "-O2 -fdump-tree-slsr" } */ >>>>>>> + >>>>>>> +typedef int arr_2[50][50]; >>>>>>> + >>>>>>> +void foo (arr_2 a2, int v1) >>>>>>> +{ >>>>>>> + int i, j; >>>>>>> + >>>>>>> + i = v1 + 5; >>>>>>> + j = i; >>>>>>> + a2 [i-10] [j] = 2; >>>>>>> + a2 [i] [j++] = i; >>>>>>> + a2 [i+20] [j++] = i; >>>>>>> + a2 [i-3] [i-1] += 1; >>>>>>> + return; >>>>>>> +} >>>>>>> + >>>>>>> +/* { dg-final { scan-tree-dump-times "MEM" 5 "slsr" } } */ >>>>>>> +/* { dg-final { cleanup-tree-dump "slsr" } } */ >>>>>>> >>>>>>> scanning for 5 MEMs looks non-sensical. What transform do >>>>>>> you expect? I see other slsr testcases do similar non-sensical >>>>>>> checking which is bad, too. >>>>>> >>>>>> >>>>>> As the slsr optimizes CAND_REF candidates by simply lowering them >to >>>>>> MEM_REF from e.g. ARRAY_REF, I think scanning for the number of >MEM_REFs >>>>>> is an effective check. Alternatively, I can add a follow-up >patch to >>>>>> add some dumping facility in replace_ref () to print out the >replacing >>>>>> actions when -fdump-tree-slsr-details is on. >>>> >>>> I think adding some details to the dump and scanning for them would >be >>>> better. That's the only change that is required for this to move >forward. >>> >>> >>> I've updated to patch to dump more details when >-fdump-tree-slsr-details is >>> on. The tests have also been updated to scan for these new dumps >instead of >>> MEMs. >>> >>> >>>> >>>> I suggest doing it quickly. We're well past stage1 close at this >point. >>> >>> >>> The bootstrapping on x86_64 is still running. OK to commit if it >succeeds? >> >> I still don't like it. It's using the wrong and too expensive tools >to do >> stuff. What kind of bases are we ultimately interested in? Browsing >> the code it looks like we're having >> >> /* Base expression for the chain of candidates: often, but not >> always, an SSA name. */ >> tree base_expr; >> >> which isn't really too informative but I suppose they are all >> kind-of-gimple_val()s? That said, I wonder if you can simply >> use get_addr_base_and_unit_offset in place of get_alternative_base >(), >> ignoring the returned offset. > >'base_expr' is essentially the base address of a handled_component_p, >e.g. ARRAY_REF, COMPONENT_REF, etc. In most case, it is the address of > >the object returned by get_inner_reference (). > >Given a test case like the following: > >typedef int arr_2[20][20]; > >void foo (arr_2 a2, int i, int j) >{ > a2[i+10][j] = 1; > a2[i+10][j+1] = 1; > a2[i+20][j] = 1; >} > >The IR before SLSR is (on x86_64): > > _2 = (long unsigned int) i_1(D); > _3 = _2 * 80; > _4 = _3 + 800; > _6 = a2_5(D) + _4; > *_6[j_8(D)] = 1; > _10 = j_8(D) + 1; > *_6[_10] = 1; > _12 = _3 + 1600; > _13 = a2_5(D) + _12; > *_13[j_8(D)] = 1; > >The base_expr for the 1st and 2nd memory reference are the same, i.e. >_6, while the base_expr for a2[i+20][j] is _13. > >_13 is essentially (_6 + 800), so all of the three memory references >essentially share the same base address. As their strides are also the > >same (MULT_EXPR (j, 4)), the three references can all be lowered to >MEM_REFs. What this patch does is to use the tree affine tools to help > >recognize the underlying base address expression; as it requires >looking >into the definitions of SSA_NAMEs, get_addr_base_and_unit_offset () >won't help here. > >Bill has helped me exploit other ways of achieving this in SLSR, but so > >far we think this is the best way to proceed. The use of tree affine >routines has been restricted to CAND_REFs only and there is the >aforementioned cache facility to help reduce the overhead. > >Thanks, >Yufeng > >P.S. some more details what the patch does: > >The CAND_REF for the three memory references are: > > 6 [2] *_6[j_8(D)] = 1; > REF : _6 + ((sizetype) j_8(D) * 4) + 0 : int[20] * > basis: 0 dependent: 8 sibling: 0 > next-interp: 0 dead-savings: 0 > > 8 [2] *_6[_10] = 1; > REF : _6 + ((sizetype) j_8(D) * 4) + 4 : int[20] * > basis: 6 dependent: 11 sibling: 0 > next-interp: 0 dead-savings: 0 > > 11 [2] *_13[j_8(D)] = 1; > REF : _13 + ((sizetype) j_8(D) * 4) + 0 : int[20] * > basis: 8 dependent: 0 sibling: 0 > next-interp: 0 dead-savings: 0 > >Before the patch, the strength reduction candidate chains for the three > >CAND_REFs are: > > _6 -> 6 -> 8 > _13 -> 11 > >i.e. SLSR recognizes the first two references share the same basis, >while the last one is on it own. > >With the patch, an extra candidate chain can be recognized: > > a2_5(D) + (sizetype) i_1(D) * 80 -> 6 -> 11 -> 8 > >i.e. all of the three references are found to have the same basis >(a2_5(D) + (sizetype) i_1(D) * 80), which is essentially the expanded >_6 >or _13, with the immediate offset removed. The pass is now able to >lower all of the three references, instead of the first two only, to >MEM_REFs.
Ok, so slsr handles arbitrary complex bases and figures out common components? If so, then why not just use get_inner_reference? After all slsr does not use tree-affine as representation for bases (which it could?) Richard.