2015-04-16 19:47 GMT+03:00 Georg-Johann Lay <a...@gjlay.de>:
>
> Am 04/16/2015 um 11:28 AM schrieb Senthil Kumar Selvaraj:
>>
>> On Thu, Apr 16, 2015 at 11:02:05AM +0200, Georg-Johann Lay wrote:
>>>
>>> Am 04/16/2015 um 08:43 AM schrieb Senthil Kumar Selvaraj:
>>>>
>>>> This patch fixes PR 65657.
>>>
>>>
>>> The following artifact appears to be PR63633.
>>>
>>
>> I did see that one - unfortunately, that fix won't help here. IIUC, you
>> check if input/output operand hard regs are in the clobber list,
>> and if yes, you generate pseudos to save and restore clobbered hard
>> regs.
>>
>> In this case, the reg is actually clobbered by a different insn (one
>
>
> Arrgh, yes...
>
>> that loads the next argument to the function). So unless I blindly generate 
>> pseudos for
>> all regs in the clobber list, the clobbering will still happen.
>>
>> FWIW, I did try saving/restoring all clobbered regs, and it did fix the
>> problem - just that it appeared like a (worse) hack to me. Aren't we
>> manually replicating what RA/reload should be doing?
>
>
> As it appears, we'll have to do it by hand.  The attaches patch is just a 
> sketch that indicates how the problem could be approached.  Notice the new 
> assertions in the split expanders; they will throw ICE until the fix is 
> actually installed.
>
> The critical insn are generated in movMM expander and are changed to have no 
> clobbers (SCRATCHes actually).  An a later pass, when appropriate life info 
> can be made available, run yet another avr pass that
>
> 1a) Save-restore needed hard regs around the insn.
>
> 2a) Kick out hard regs overlapping the clobbers, e.g. in set_src, into new 
> pseudos.  Maybe that could happen due to some hard regs progagation, or we 
> can use a new predicate similar combine_pseudo_register_operand.
>
> 3) Replace scratch -> hard regs for all scratch_operands.
>
> 2b) Restore respective hard regs from their pseudos.
>
> 1b) Restore respective hard regs from their pseudos.
>
>
> And maybe we can get better code by allocating things like address register 
> by hand and get better code then.
>
> When I implemented some of the libgcc insns I tried to express the operand by 
> means of constraints, e.h. for (reg:HI 22) and let register allocator do the 
> job.
>
> The resulting code was functional but *horrific*.
>
> The register allocator is not yet ready to generate efficient code in such 
> demanding situations...
>
>>
>> What do you think?
>>
>
> IMO sooner or later we'll need such an infrastructure; maybe also for non-mov 
> insn that are implemented by transparent libcalls like divmod, mul, etc.
>
>>>> When cfgexpand.c expands a function call, it first figures out the
>>>> registers in which the arguments will go, followed by expansion of the
>>>> arguments themselves (right to left). It later emits mov insns to set
>>>> the precomputed registers with the expanded RTXes.
>>>>
>>>> If one of the arguments is a __memx char pointer dereference, the mov
>>>> eventually expands to gen_xload<mode>_A (for certain cases), which
>>>> clobbers R22, R21 and Z.  This causes problems if one of those
>>>> clobbered registers was precomputed to hold another argument.
>>>>
>>>> In general, call expansion does not appear to take clobbers into account -
>
>
> We had been warned that using hard regs is evil...  But without that 
> technique the code quality would decrease way too much.
>
>>>> when it checks for argument overlap, the RTX (args[i].value) is only a MEM
>>>> in QImode for the memx deref - the clobber shows up when it eventually
>>>> calls emit_move_insn, at which point, it is too late.
>
>
> Such situations could only be handled by a target hook which allowed to 
> expand specific trees by hand...  Such a hook could cater for insn that must 
> use hard registers.
>
>>>> This does not happen for a int pointer dereference - turns out that
>>>> precompute_register_parameters does a copy_to_mode_reg if the
>>>> cost of args[i].value is more than COSTS_N_INSNS(1) i.e., it creates a
>>>> pseudo and later assigns the pseudo to the arg register. This is done
>>>> before any moves to arg registers is done, so other arguments are not
>>>> overwritten.
>>>>
>>>> Doing the same thing - providing a better cost estimate for a MEM rtx in
>>>> the non-default address space, makes this problem go away, and that is
>>>> what this patch does. Regression testing does not show any new failures.
>
>
> Can you tell something about overall code quality? If it is not significantly 
> worse then I'd propose to apply your rtx-costs solution soon.  The full fix 
> will take more time to work it out.

I'm agree with Georg.

Reply via email to