Robert Suchanek <robert.sucha...@imgtec.com> writes: >> FYI, all other targets that have LRA optionally selectable or deselectable >> use -mno-lra for this (even when -mlra is the default), it would be better >> for consistency not to invent new switch names for that. > > Agreed. > >>> - return !strict_p || GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) >>> == 8; >>> + return GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8; >>> >>> return TARGET_MIPS16 ? M16_REG_P (regno) : GP_REG_P (regno); >>> } >> Not sure about this one. We would need to update the comment that >> explains why "!strict_p" is there, but AFAIK reason (1) would still apply. >> >> Was this needed for correctness or because it gave better code? > > "!strict_p" has been removed because of correctness issue. When LRA > validates memory addresses pseudos are temporarily eliminated to hard > registers (if possible) but the target hook is always called as > non-strict. This only affects MIPS16 instructions with not directly > accessible $sp. The strict variant, as I understand, was used in the > reload pass to indicate if a pseudo-register has been allocated a hard > register. Unless LRA should be setting the strict/non-strict depending > on whether a temporal elimination to hard reg was successful or there > is something else that I missed?
Hmm, OK, in that case I agree reason (2) doesn't apply. That part was always more of a consistency thing anyway, so I agree it's not worth keeping around for reload. I also had a look to see why instantiate_virtual_regs_in_insn didn't complain about cases like: struct s { unsigned char c; }; void foo (int, int, int, int, struct s); void bar (struct s *ptr) { foo (1, 2, 3, 4, *ptr); } and I think it's because of the later: 2008-02-14 Michael Matz <m...@suse.de> PR target/34930 * function.c (instantiate_virtual_regs_in_insn): Reload address before falling back to reloading the whole operand. which correctly reloads the address if necessary. So yeah, I agree this is right after all, sorry. Let's delete the comment starting at "There are two problems here:" at the same time. >>> + M16F_REGS, /* mips16 + frame */ >> >> Constraints are supposed to be operating on real registers, after >> elimination, so it seems odd to include a fake register. What went >> wrong with just M16_REGS? > > Only the stack pointer has been added to M16_REGS. Sorry, I'd read "frame" as meaning "$frame", the soft frame pointer. I agree M16_REGS + $sp is OK. mips_regno_to_class should then map $sp to the new class, since it's now the smallest containing class. (We really should set that up automatically one day...) > A number of patterns need to accept it otherwise LRA inserts a lot of > reloads and the code size goes up by about 10%. The change does have > also a positive effect on reload but marginally. "frame" meant to > indicate inclusion of both the stack and hard frame pointers in the > class but perhaps I should name it differently to avoid confusion. How about M16_SP_REGS, to match M16_T_REGS? Also, the BASE_REG_CLASS/ADDR_REG_CLASS distinction isn't all that obvious from the names. ADDR_REG_CLASS is only needed for the "d" constraint so maybe we could just use TARGET_MIPS16 ? M16_REGS : GR_REGS directly for now. >>> + SPILL_REGS, /* All but $sp and call preserved regs are in here */ >>... >>> + { 0x0003fffc, 0x00000000, 0x00000000, 0x00000000, 0x00000000, >>> 0x00000000 }, /* SPILL_REGS */ \ >> >> These two don't seem to match. I think literally it would be 0x0300fffc, >> but maybe you had to make SPILL_REGS a superset of M16_REGs? > > I initially used 0x0300fffc but did some experiments and it turned out > that 0x0003fffc (with $16, $17 regs) gives slightly better code. I > haven't updated the comment though. I can imagine including all M16_REGS makes sense, but it seems odd to drop the 2 temporaries. Does 0x0303fffc have the same problem? > There is yet more to do and need to return to another thread with > MIPS16 at some point as I found some limitations of IRA/LRA to > generate better code. $8-$15 are currently inaccessible as temporary > storage because these registers are marked as fixed (when optimizing > for size) but leaving them as fixed are better for the code size. I > don't expect a big gain by using hard registers for spilling but it > more likely to improve the performance. Hmm, marking them fixed was supposed to be a temporary reload-only thing, until the move to LRA. It should never be worse to spill to these GPRs over spilling to the stack, if the value isn't live across a call. But that certainly doesn't need to be part of the initial patch. >>> +/* Add costs to hard registers based on frequency. This helps to negate >>> + some of the reduced cost associated with argument registers which >>> + unfairly promotes their use and increases register pressure */ >>> +#define IRA_HARD_REGNO_ADD_COST_MULTIPLIER(REGNO) \ >>> + (TARGET_MIPS16 && optimize_size \ >>> + ? ((REGNO) >= 4 && (REGNO) <= 7 ? 2 : 0) : 0) >> >> So we would be trying to use, say, $4 for the first incoming argument >> even after it had been spilled? Hmm. >> >> Since this change is aimed specifically at one heuristic, I wonder >> whether we should parameterise that heuristic somehow rather than >> try to use a general hook to undo it. But I don't think there's >> anything particularly special about MIPS16 here, so maybe it's too >> eager for all targets. > > In a number of cases argument registers appeared to be unfairly promoted > increasing the register pressure and increasing the number of reloads. > Bumping up the cost of using those registers encourages IRA to spill > into memory but this appears to help LRA to do a better allocation. Of course, > not always it is a win but generally the gain outweighs the loss. > > I've seen an codesize improvements for other optimization levels > but I'm unsure whether we should make this change generic. > This part of the patch is not crucial though and can be send separately. OK, thanks, doing it separately sounds better. >>> (define_insn "*and<mode>3_mips16" >>> ... >> >> I think we want to keep the LWU case at the very least, since I assume >> otherwise we'll load the full 64-bit spill slot and use a pair of shifts >> to zero-extend it. Reloading the stack address into a base register >> should always be better than that. >> >> I agree it's less clear for the byte and halfword cases. All other >> things -- including size -- being equal, reloading a stack address into >> a base register and using an extending load should be better than >> reloading the full spill slot and extending it, since the reloaded stack >> address is more likely to be reused in other instructions. >> >> The original MIPS16 didn't have ZEB and ZEH, so on the basis above, >> I think using LBU and LHU there was probably a win. I can imagine >> it making sense to disable LBU and LHU for MIPS16e though. >> >> It might be better to do any changes to this pattern as a follow-on >> patch, since I think LRA should cope with it as-is. > > Keeping LWU case should be fine. Alternatives were removed because > LRA was ICEing for the same reason of $sp not accessible for the byte > and halfword cases. I tried to find a testcase to trigger LWU case > but couldn't. I presume it must be a rare case? The problem with other > alternatives was that spilled pseudos were changed to memory without > checking if the use of $sp is valid. On the other hand, keeping LWU > may only delay triggering the problem because the stack pointer > should not be used. It could be a missing case in LRA too. Did you see the failures even after your mips_regno_mode_ok_for_base_p change? LRA should know how to reload a "W" address. >>> diff --git gcc/rtlanal.c gcc/rtlanal.c >>> index 7a9efec..f699d17 100644 >>> --- gcc/rtlanal.c >>> +++ gcc/rtlanal.c >>> @@ -5577,7 +5577,8 @@ get_base_term (rtx *inner) >>> inner = strip_address_mutations (&XEXP (*inner, 0)); >>> if (REG_P (*inner) >>> || MEM_P (*inner) >>> - || GET_CODE (*inner) == SUBREG) >>> + || GET_CODE (*inner) == SUBREG >>> + || CONSTANT_P (*inner)) >>> return inner; >>> return 0; >>> } >> >> This doesn't look right; the general form is base+index+displacement+segment, >> with the constant term being treated as the displacement. > > I'm not particularly happy with this either. This was an attempt to fix an > ICE for > the following RTL (gcc.dg/torture/asm-subreg-1.c compiled with -mips32r2 > -mips16 -O1): > > (insn 9 8 0 2 (asm_operands/v ("") ("") 0 [ > (mem/v/j/c:HI (lo_sum:SI (const:SI (unspec:SI [ > (const_int 0 [0]) > ] UNSPEC_GP)) > (symbol_ref:SI ("_const_32") [flags 0x6] <var_decl > 0x7f50acd17558 _const_32>)) [0 _const_32+0 S2 A32])] > [(asm_input:HI ("X") (null):0)] > [] asm-subreg-1.c:13) asm-subreg-1.c:13 -1 (nil)) > > Any suggestions to handle this case? Thanks for the pointer. I think this shows a more fundamental problem with the handling of "X" constraints. With something like: void foo (int **x, int y, int z) { int *ptr = *x + y * z / 11; __asm__ __volatile__ ("" : : "X" (*ptr)); } the entire expression gets treated as a MEM address, which neither reload nor LRA can handle. And with something like that, it isn't obvious what class all the registers in the address should have. With a sufficiently-complicated expression you could run out of registers. So perhaps we should limit the propagation to things that decompose_mem_address can handle. Thanks, Richard