2015-04-16 19:47 GMT+03:00 Georg-Johann Lay <a...@gjlay.de>: > > Am 04/16/2015 um 11:28 AM schrieb Senthil Kumar Selvaraj: >> >> On Thu, Apr 16, 2015 at 11:02:05AM +0200, Georg-Johann Lay wrote: >>> >>> Am 04/16/2015 um 08:43 AM schrieb Senthil Kumar Selvaraj: >>>> >>>> This patch fixes PR 65657. >>> >>> >>> The following artifact appears to be PR63633. >>> >> >> I did see that one - unfortunately, that fix won't help here. IIUC, you >> check if input/output operand hard regs are in the clobber list, >> and if yes, you generate pseudos to save and restore clobbered hard >> regs. >> >> In this case, the reg is actually clobbered by a different insn (one > > > Arrgh, yes... > >> that loads the next argument to the function). So unless I blindly generate >> pseudos for >> all regs in the clobber list, the clobbering will still happen. >> >> FWIW, I did try saving/restoring all clobbered regs, and it did fix the >> problem - just that it appeared like a (worse) hack to me. Aren't we >> manually replicating what RA/reload should be doing? > > > As it appears, we'll have to do it by hand. The attaches patch is just a > sketch that indicates how the problem could be approached. Notice the new > assertions in the split expanders; they will throw ICE until the fix is > actually installed. > > The critical insn are generated in movMM expander and are changed to have no > clobbers (SCRATCHes actually). An a later pass, when appropriate life info > can be made available, run yet another avr pass that > > 1a) Save-restore needed hard regs around the insn. > > 2a) Kick out hard regs overlapping the clobbers, e.g. in set_src, into new > pseudos. Maybe that could happen due to some hard regs progagation, or we > can use a new predicate similar combine_pseudo_register_operand. > > 3) Replace scratch -> hard regs for all scratch_operands. > > 2b) Restore respective hard regs from their pseudos. > > 1b) Restore respective hard regs from their pseudos. > > > And maybe we can get better code by allocating things like address register > by hand and get better code then. > > When I implemented some of the libgcc insns I tried to express the operand by > means of constraints, e.h. for (reg:HI 22) and let register allocator do the > job. > > The resulting code was functional but *horrific*. > > The register allocator is not yet ready to generate efficient code in such > demanding situations... > >> >> What do you think? >> > > IMO sooner or later we'll need such an infrastructure; maybe also for non-mov > insn that are implemented by transparent libcalls like divmod, mul, etc. > >>>> When cfgexpand.c expands a function call, it first figures out the >>>> registers in which the arguments will go, followed by expansion of the >>>> arguments themselves (right to left). It later emits mov insns to set >>>> the precomputed registers with the expanded RTXes. >>>> >>>> If one of the arguments is a __memx char pointer dereference, the mov >>>> eventually expands to gen_xload<mode>_A (for certain cases), which >>>> clobbers R22, R21 and Z. This causes problems if one of those >>>> clobbered registers was precomputed to hold another argument. >>>> >>>> In general, call expansion does not appear to take clobbers into account - > > > We had been warned that using hard regs is evil... But without that > technique the code quality would decrease way too much. > >>>> when it checks for argument overlap, the RTX (args[i].value) is only a MEM >>>> in QImode for the memx deref - the clobber shows up when it eventually >>>> calls emit_move_insn, at which point, it is too late. > > > Such situations could only be handled by a target hook which allowed to > expand specific trees by hand... Such a hook could cater for insn that must > use hard registers. > >>>> This does not happen for a int pointer dereference - turns out that >>>> precompute_register_parameters does a copy_to_mode_reg if the >>>> cost of args[i].value is more than COSTS_N_INSNS(1) i.e., it creates a >>>> pseudo and later assigns the pseudo to the arg register. This is done >>>> before any moves to arg registers is done, so other arguments are not >>>> overwritten. >>>> >>>> Doing the same thing - providing a better cost estimate for a MEM rtx in >>>> the non-default address space, makes this problem go away, and that is >>>> what this patch does. Regression testing does not show any new failures. > > > Can you tell something about overall code quality? If it is not significantly > worse then I'd propose to apply your rtx-costs solution soon. The full fix > will take more time to work it out.
I'm agree with Georg.