Yufeng Zhang <yufeng.zh...@arm.com> wrote:
>On 12/03/13 14:20, Richard Biener wrote:
>> On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhang<yufeng.zh...@arm.com> 
>wrote:
>>> On 12/03/13 06:48, Jeff Law wrote:
>>>>
>>>> On 12/02/13 08:47, Yufeng Zhang wrote:
>>>>>
>>>>> Ping~
>>>>>
>>>>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Yufeng
>>>>>
>>>>> On 11/26/13 15:02, Yufeng Zhang wrote:
>>>>>>
>>>>>> On 11/26/13 12:45, Richard Biener wrote:
>>>>>>>
>>>>>>> On Thu, Nov 14, 2013 at 12:25 AM, Yufeng
>>>>>>> Zhang<yufeng.zh...@arm.com>     wrote:
>>>>>>>>
>>>>>>>> On 11/13/13 20:54, Bill Schmidt wrote:
>>>>>>>>>
>>>>>>>>> The second version of your original patch is ok with me with
>the
>>>>>>>>> following changes.  Sorry for the little side adventure into
>the
>>>>>>>>> next-interp logic; in the end that's going to hurt more than
>it
>>>>>>>>> helps in
>>>>>>>>> this case.  Thanks for having a look at it, anyway.  Thanks
>also for
>>>>>>>>> cleaning up this version to be less intrusive to common
>interfaces; I
>>>>>>>>> appreciate it.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks a lot for the review.  I've attached an updated patch
>with the
>>>>>>>> suggested changes incorporated.
>>>>>>>>
>>>>>>>> For the next-interp adventure, I was quite happy to do the
>>>>>>>> experiment; it's
>>>>>>>> a good chance of gaining insight into the pass.  Many thanks
>for
>>>>>>>> your prompt
>>>>>>>> replies and patience in guiding!
>>>>>>>>
>>>>>>>>
>>>>>>>>> Everything else looks OK to me.  Please ask Richard for final
>>>>>>>>> approval,
>>>>>>>>> as I'm not a maintainer.
>>>>
>>>> First a note, I need to check on voting for Bill as the slsr
>maintainer
>>>> from the steering committee.   Voting was in progress just before
>the
>>>> close of stage1 development so I haven't tallied the results :-)
>>>
>>>
>>> Looking forward to some good news! :)
>>>
>>>
>>>>>>
>>>>>> Yes, you are right about the non-trivial 'base' tree are rarely
>shared.
>>>>>>      The cached is introduced mainly because get_alternative_base
>() may
>>>>>> be
>>>>>> called twice on the same 'base' tree, once in the
>>>>>> find_basis_for_candidate () for look-up and the other time in
>>>>>> alloc_cand_and_find_basis () for record_potential_basis ().  I'm
>happy
>>>>>> to leave out the cache if you think the benefit is trivial.
>>>>
>>>> Without some sense of how expensive the lookups are vs how often
>the
>>>> cache hits it's awful hard to know if the cache is worth it.
>>>>
>>>> I'd say take it out unless you have some sense it's really saving
>time.
>>>>     It's a pretty minor implementation detail either way.
>>>
>>>
>>> I think the affine tree routines are generally expensive; it is
>worth having
>>> a cache to avoid calling them too many times.  I run the slsr-*.c
>tests
>>> under gcc.dg/tree-ssa/ and find out that the cache hit rates range
>from
>>> 55.6% to 90%, with 73.5% as the average.  The samples may not well
>represent
>>> the real world scenario, but they do show the fact that the 'base'
>tree can
>>> be shared to some extent.  So I'd like to have the cache in the
>patch.
>>>
>>>
>>>>
>>>>>>
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-tree-slsr" } */
>>>>>>> +
>>>>>>> +typedef int arr_2[50][50];
>>>>>>> +
>>>>>>> +void foo (arr_2 a2, int v1)
>>>>>>> +{
>>>>>>> +  int i, j;
>>>>>>> +
>>>>>>> +  i = v1 + 5;
>>>>>>> +  j = i;
>>>>>>> +  a2 [i-10] [j] = 2;
>>>>>>> +  a2 [i] [j++] = i;
>>>>>>> +  a2 [i+20] [j++] = i;
>>>>>>> +  a2 [i-3] [i-1] += 1;
>>>>>>> +  return;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-tree-dump-times "MEM" 5 "slsr" } } */
>>>>>>> +/* { dg-final { cleanup-tree-dump "slsr" } } */
>>>>>>>
>>>>>>> scanning for 5 MEMs looks non-sensical.  What transform do
>>>>>>> you expect?  I see other slsr testcases do similar non-sensical
>>>>>>> checking which is bad, too.
>>>>>>
>>>>>>
>>>>>> As the slsr optimizes CAND_REF candidates by simply lowering them
>to
>>>>>> MEM_REF from e.g. ARRAY_REF, I think scanning for the number of
>MEM_REFs
>>>>>> is an effective check.  Alternatively, I can add a follow-up
>patch to
>>>>>> add some dumping facility in replace_ref () to print out the
>replacing
>>>>>> actions when -fdump-tree-slsr-details is on.
>>>>
>>>> I think adding some details to the dump and scanning for them would
>be
>>>> better.  That's the only change that is required for this to move
>forward.
>>>
>>>
>>> I've updated to patch to dump more details when
>-fdump-tree-slsr-details is
>>> on.  The tests have also been updated to scan for these new dumps
>instead of
>>> MEMs.
>>>
>>>
>>>>
>>>> I suggest doing it quickly.  We're well past stage1 close at this
>point.
>>>
>>>
>>> The bootstrapping on x86_64 is still running.  OK to commit if it
>succeeds?
>>
>> I still don't like it.  It's using the wrong and too expensive tools
>to do
>> stuff.  What kind of bases are we ultimately interested in?  Browsing
>> the code it looks like we're having
>>
>>    /* Base expression for the chain of candidates:  often, but not
>>       always, an SSA name.  */
>>    tree base_expr;
>>
>> which isn't really too informative but I suppose they are all
>> kind-of-gimple_val()s?  That said, I wonder if you can simply
>> use get_addr_base_and_unit_offset in place of get_alternative_base
>(),
>> ignoring the returned offset.
>
>'base_expr' is essentially the base address of a handled_component_p, 
>e.g. ARRAY_REF, COMPONENT_REF, etc.  In most case, it is the address of
>
>the object returned by get_inner_reference ().
>
>Given a test case like the following:
>
>typedef int arr_2[20][20];
>
>void foo (arr_2 a2, int i, int j)
>{
>   a2[i+10][j] = 1;
>   a2[i+10][j+1] = 1;
>   a2[i+20][j] = 1;
>}
>
>The IR before SLSR is (on x86_64):
>
>   _2 = (long unsigned int) i_1(D);
>   _3 = _2 * 80;
>   _4 = _3 + 800;
>   _6 = a2_5(D) + _4;
>   *_6[j_8(D)] = 1;
>   _10 = j_8(D) + 1;
>   *_6[_10] = 1;
>   _12 = _3 + 1600;
>   _13 = a2_5(D) + _12;
>   *_13[j_8(D)] = 1;
>
>The base_expr for the 1st and 2nd memory reference are the same, i.e. 
>_6, while the base_expr for a2[i+20][j] is _13.
>
>_13 is essentially (_6 + 800), so all of the three memory references 
>essentially share the same base address.  As their strides are also the
>
>same (MULT_EXPR (j, 4)), the three references can all be lowered to 
>MEM_REFs.  What this patch does is to use the tree affine tools to help
>
>recognize the underlying base address expression; as it requires
>looking 
>into the definitions of SSA_NAMEs, get_addr_base_and_unit_offset () 
>won't help here.
>
>Bill has helped me exploit other ways of achieving this in SLSR, but so
>
>far we think this is the best way to proceed.  The use of tree affine 
>routines has been restricted to CAND_REFs only and there is the 
>aforementioned cache facility to help reduce the overhead.
>
>Thanks,
>Yufeng
>
>P.S. some more details what the patch does:
>
>The CAND_REF for the three memory references are:
>
>  6  [2] *_6[j_8(D)] = 1;
>      REF  : _6 + ((sizetype) j_8(D) * 4) + 0 : int[20] *
>      basis: 0  dependent: 8  sibling: 0
>      next-interp: 0  dead-savings: 0
>
>   8  [2] *_6[_10] = 1;
>      REF  : _6 + ((sizetype) j_8(D) * 4) + 4 : int[20] *
>      basis: 6  dependent: 11  sibling: 0
>      next-interp: 0  dead-savings: 0
>
>  11  [2] *_13[j_8(D)] = 1;
>      REF  : _13 + ((sizetype) j_8(D) * 4) + 0 : int[20] *
>      basis: 8  dependent: 0  sibling: 0
>      next-interp: 0  dead-savings: 0
>
>Before the patch, the strength reduction candidate chains for the three
>
>CAND_REFs are:
>
>   _6 -> 6 -> 8
>   _13 -> 11
>
>i.e. SLSR recognizes the first two references share the same basis, 
>while the last one is on it own.
>
>With the patch, an extra candidate chain can be recognized:
>
>   a2_5(D) + (sizetype) i_1(D) * 80 -> 6 -> 11 -> 8
>
>i.e. all of the three references are found to have the same basis 
>(a2_5(D) + (sizetype) i_1(D) * 80), which is essentially the expanded
>_6 
>or _13, with the immediate offset removed.  The pass is now able to 
>lower all of the three references, instead of the first two only, to 
>MEM_REFs.

Ok, so slsr handles arbitrary complex bases and figures out common components? 
If so, then why not just use get_inner_reference? After all slsr does not use 
tree-affine as representation for bases (which it could?)

Richard.


Reply via email to