Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting

Jiong Wang Thu, 21 May 2015 14:34:39 -0700

Jeff Law writes:

> On 05/14/2015 03:13 PM, Jiong Wang wrote:
>>
>> Jeff Law writes:
>>
>>> For all kinds of reassociation we have to concern ourselves with adding
>>> overflow where it didn't already occur.  Assuming a 32 bit architecture
>>> we could get overflow if A is 0x7fffffff, b is -4 and and c = 3
>>>
>>> 0x7fffffff + -4 = 0x7ffffffb
>>> 0x7ffffffb + 3 = 0x7ffffffe
>>>
>>>
>>> If you make the transformation you're suggesting we get
>>>
>>> 0x7fffffff + 3 = 0x80000002  OVERFLOW
>>> 0x80000002 - 4 = 0x7ffffffe
>>>
>>> Now if you always know pointers are unsigned, then the overflow is
>>> defined and you'd be OK.  But that's a property of the target and one
>>> that's not well modeled within GCC (we have POINTER_EXTEND_UNSIGNED
>>> which kind of tells us something in this space).
>>
>> I see, understood, cool! Thanks for such detailed explanation.
>>
>> Above scenario do may happen for general pointer arith
>> reassociation.
>>
>> One thing may make life easier as my reassociation is restricted within
>> frame pointer. the "(plus (plus fp, index_reg) + const_off)" pattern was
>> to address some variable on stack. index_reg, const_off were part of
>> the stack offset of the variable. Reassociate them means reorder two
>> parts of the stack offset. There may be way to prove the transformation
>> will not add extra overflow risk, especially when the index_reg is
>> unsigned.
>>
>> I understand for general pointer arith reassociation, there do have big
>> risk, as the involved operands largely come from irrelevant instruction,
>> no relationship between the values from those operands, we can deduce 
>> nothing.
> Given the special status of SP, FP and ARGP and a known constant part, 
> we can probably do something here.  More below...
>
>
>
>>
>>>
>>> In addition to worrying about overflow, you have to worry about
>>> segmented architectures with implicit segment selection -- especially if
>>> the segment selection comes from the base register than the entire
>>> effective address.
>>>
>>
>> Hmm, understood!
>>
>> This let me recall something as dark as x86 segment descriptor in protecting 
>> mode...
> Possibly, I've actually never studied the segmented aspects of the x86. 
>   But I'm painfully familiar with the others mentioned :(
>
> My recollection for the segmented stuff on the PA is we only had a 
> single guard page at both ends of the segment.  So we only allowed an 
> offset of +-4k when doing address reassociations in legitimize_address. 
>   This was possible because we had callouts from the right places in the 
> RTL generators/optimizers to allow targets to rewrite address 
> arithmetic.  So we could naturally bury the target details away from the 
> code generator/optimizers.
>
> So we could possibly parameterize the transformation around similar 
> concepts.  The design issue here is it's introducing more target 
> dependencies in places where we've really wanted to avoid them.  In 
> theory the gimple optimizers are supposed to be target independent. 
> Reality is some stuff bleeds into them (the one that's mentioned the 
> most often is branch costing, but there's others).
>
> *If* we decide to go forward with using some target hooks here.  I'd be 
> tempted to do 2.  One that's effective a tri-state.  Full reassociation, 
> limited reassociation, no reassociation.  The second would bound the 
> constants in the limited reassociation case.
>
> Thoughts?


Thanks for these thoughts.

I tried but still can't prove this transformation will not introduce
extra pointer overflow even given it's reassociation with vfp, although
my first impression is it do will not introduce extra risk in real
application.

Have done a quick check on hppa's legitimize_address. I see for (plus
sym_ref, const_int), if const_int is beyond +-4K, then that hook will
force them into register, then (plus reg, reg) is always OK.

So for target hooks,  my understanding of your idea is something like:

 new hook targetm.pointer_arith_reassociate (), if return -1 then
 support full reassociation, 0 for limited, 1 for should not do any
 reassociation. the default version return -1 as most targets are OK to
 do reassociation given we can prove there is no introducing of overflow
 risk. While for target like HPPA, we should define this hook to return
 0 for limited support.

 Then, if targetm.pointer_arith_reassociate () return 1, we should
 further invoke the second hook targetm.limited_reassociate_p (rtx x),
 to check the reassociated rtx 'x' meets any restrictions, for example
 for HPPA, constants part shouldn't beyond +-4K.

not sure whether my understanding is correct.
 
-- 
Regards,
Jiong

Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting

Reply via email to