Andrew Stubbs <a...@codesourcery.com> writes:
> On 17/04/12 18:20, Richard Sandiford wrote:
>> Andrew Stubbs<a...@codesourcery.com>  writes:
>>> Hi all,
>>>
>>> I can see why copying from one pseudo-register to another would not be a
>>> reason *not* to decompose a register, but I don't understand why this is
>>> a reason to say it *should* be decomposed.
>>
>> The idea is that, if a backend implements an N-word pseudo move using
>> N word-mode moves, it is better to expose those moves before register
>> allocation.  It's easier for RA to find N separate word-mode registers
>> than a single contiguous N-word one.
>
> Ok, I think I understand that, but it seems slightly wrong to me.
>
> It makes sense to lower *real* moves, but before the fwprop pass there 
> are quite a lot of pseudos that only exist as artefacts of the expand 
> process. Moving the subreg1 pass after fwprop1 would probably do the 
> trick, but that would probably also defeat the object of lowering early.
>
> I've done a couple of experiments:
>
> First, I tried adding an extra fwprop pass before subreg1. I needed to 
> move up the dfinit pass also to make that work, but then it did work: it 
> successfully compiled my testcase without a regression.
>
> I'm not sure that adding an extra pass isn't overkill, so second I tried 

Yeah, sounds rather expensive :-)

> adjusting lower-subreg to avoid this problem; I modified 
> find_pseudo_copy so that it rejected copies that didn't change the mode, 
> on the principle that fwprop would probably have eliminated the move 
> anyway. This was successful also, and a much less expensive change.
>
> Does that make sense? The pseudos involved in the move will still get 
> lowered if the other conditions hold.

The problem is that not all register moves are always going to be
eliminated, even when no mode changes are involved.  It might make
sense to restrict that code you quoted:

            case SIMPLE_PSEUDO_REG_MOVE:
              if (MODES_TIEABLE_P (GET_MODE (x), word_mode))
                bitmap_set_bit (decomposable_context, regno);
              break;

to the second pass though.

>> The problem is the "if a backend implements ..." bit: the current code
>> doesn't check.  This patch:
>>
>>      http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00094.html
>>
>> should help.  It's still waiting for me to find a case where the two
>> possible ways of handling hot-cold partitioning behave differently.
>
> I've not studied that patch in detail, but I'm not sure it'll help. In 
> most cases, including my testcase, lowering is the correct thing to do 
> if NEON (or IWMMXT, perhaps) is not enabled.

Right.  I think I misunderstood, sorry.  I thought this regression was
for NEON only, but do you mean that adding these NEON patterns introduces
the regression for non-NEON targets as well?

> When NEON is enabled, however, it may still be the right thing to do:
> NEON does not provide a full set of DImode operations. The test for
> subreg-only uses ought to be enough to differentiate, once the
> extraneous pseudos such as the one in my testcase have been dealt
> with.

OK.  If/when that patches goes in, the ARM backend is going to have
to pick an rtx cost for DImode SETs.  It sounds like the cost will need
to be twice an SImode move regardless of whether or not NEON is enabled.

Richard

Reply via email to