Thanks for all the updates. Vladimir Makarov <vmaka...@redhat.com> writes: >>> + /* index * scale + disp => new base + index * scale */ >>> + enum reg_class cl = base_reg_class (mode, as, SCRATCH, SCRATCH); >>> + >>> + lra_assert (INDEX_REG_CLASS != NO_REGS); >>> + new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, "disp"); >>> + lra_assert (GET_CODE (*addr_loc) == PLUS); >>> + lra_emit_move (new_reg, *ad.disp_loc); >>> + if (CONSTANT_P (XEXP (*addr_loc, 1))) >>> + XEXP (*addr_loc, 1) = XEXP (*addr_loc, 0); >>> + XEXP (*addr_loc, 0) = new_reg; >> The canonical form is (plus (mult ...) (reg)) rather than >> (plus (reg) (mult ...)), but it looks like we create the latter. >> I realise you try both forms here: > It might happen because equiv substitution in LRA. >>> + /* Some targets like ARM, accept address operands in >>> + specific order -- try exchange them if necessary. */ >>> + if (! valid_address_p (mode, *addr_loc, as)) >>> + { >>> + exchange_plus_ops (*addr_loc); >>> + if (! valid_address_p (mode, *addr_loc, as)) >>> + exchange_plus_ops (*addr_loc); >>> + } >> but I think we should try the canonical form first. And I'd prefer it >> if we didn't try the other form at all, especially in 4.8. It isn't >> really the backend's job to reject non-canonical rtl. This might well >> be another case where some targets need a (hopefully small) tweak in >> order to play by the rules. >> >> Also, I suppose this section of code feeds back to my question on >> Wednesday about the distinction that LRA seems to make between the >> compile-time constant in: >> >> (plus (reg X1) (const_int Y1)) >> >> and the link-time constant in: >> >> (plus (reg X2) (symbol_ref Y2)) >> >> It looked like extract_address_regs classified X1 as a base register and >> X2 as an index register. The difference between the two constants has >> no run-time significance though, and I think we should handle both X1 >> and X2 as base registers (as I think reload does). >> >> I think the path above would then be specific to scaled indices. >> In the original address the "complex" index must come first and the >> displacement second. In the modified address, the index would stay >> first and the new base register would be second. More below. > As I wrote above the problem is also in that equiv substitution can > create non-canonical forms.
Right. Just in case there's a misunderstanding: I'm not complaining about these routines internally using forms that are noncanonical (which could happen because of equiv substitution, like you say). I just think that what we eventually try to validate should be canonical. In a way it's similar to how the simplify-rtx.c routines work. If there are targets that only accept noncanonical rtl (which is after all just a specific type of invalid rtl), they need to be fixed. >>> + /* base + scale * index + disp => new base + scale * index */ >>> + new_reg = base_plus_disp_to_reg (mode, as, &ad); >>> + *addr_loc = gen_rtx_PLUS (Pmode, new_reg, *ad.index_loc); >>> + if (! valid_address_p (mode, *addr_loc, as)) >>> + { >>> + /* Some targets like ARM, accept address operands in >>> + specific order -- try exchange them if necessary. */ >>> + exchange_plus_ops (*addr_loc); >>> + if (! valid_address_p (mode, *addr_loc, as)) >>> + exchange_plus_ops (*addr_loc); >>> + } >> Same comment as above about canonical rtl. Here we can have two >> registers -- in which case the base should come first -- or a more >> complex index -- in which case the index should come first. >> >> We should be able to pass both rtxes to simplify_gen_binary (PLUS, ...), >> with the operands in either order, and let it take care of the details. >> Using simplify_gen_binary would help with the earlier index+disp case too. > Equiv substitution can create non-canonical forms. There are 2 approaches: > o have a code for dealing with non-canonical forms (equiv substitution, > target stupidity) > o always support canonical forms and require them from targets. > > I decided to use the 1st variant but I am reconsidering it. I'll try to > fix before inclusion. But I am not sure I have time for this. All > these changes makes LRA unstable. In fact, I've just found that changes > I already made so far resulted in 2 SPEC2000 tests broken although GCC > testsuite and bootstrap looks good. OK. I'm happy to try fixing the noncanonical thing. >>> + /* If this is post-increment, first copy the location to the reload reg. >>> */ >>> + if (post && real_in != result) >>> + emit_insn (gen_move_insn (result, real_in)); >> Nit, but real_in != result can never be true AIUI, and I was confused how >> the code could be correct in that case. Maybe just remove it, or make >> it an assert? > No, it might be true: > > real_in = in == value ? incloc : in; > ... > if (cond) > result = incloc; > else > result = ... > > if (post && real_in != result) > > So it is true if in==value && cond Sorry, what I meant was that cond is "! post && REG_P (incloc)": if (! post && REG_P (incloc)) result = incloc; else result = lra_create_new_reg (GET_MODE (value), value, new_rclass, "INC/DEC result"); so it can never be true in the "post" case quoted above. >>> + dest_reg = SET_DEST (set); >>> + /* The equivalence pseudo could be set up as SUBREG in a >>> + case when it is a call restore insn in a mode >>> + different from the pseudo mode. */ >>> + if (GET_CODE (dest_reg) == SUBREG) >>> + dest_reg = SUBREG_REG (dest_reg); >>> + if ((REG_P (dest_reg) >>> + && (x = get_equiv_substitution (dest_reg)) != dest_reg >>> + /* Remove insns which set up a pseudo whose value >>> + can not be changed. Such insns might be not in >>> + init_insns because we don't update equiv data >>> + during insn transformations. >>> + >>> + As an example, let suppose that a pseudo got >>> + hard register and on the 1st pass was not >>> + changed to equivalent constant. We generate an >>> + additional insn setting up the pseudo because of >>> + secondary memory movement. Then the pseudo is >>> + spilled and we use the equiv constant. In this >>> + case we should remove the additional insn and >>> + this insn is not init_insns list. */ >>> + && (! MEM_P (x) || MEM_READONLY_P (x) >>> + || in_list_p (curr_insn, >>> + ira_reg_equiv >>> + [REGNO (dest_reg)].init_insns))) >> This is probably a stupid question, sorry, but when do we ever want >> to keep an assignment to a substituted pseudo? I.e. why isn't this just: >> >> if ((REG_P (dest_reg) >> && (x = get_equiv_substitution (dest_reg)) != dest_reg) > Equivalence can be memory location. It means you can assign many > different values to the location. Removing them would generate a wrong > code. For example, if you use your simple variant of code, you will > have > 100 test failures of GCC testsuite. OK :-) >>> +/* Return true if we need a split for hard register REGNO or pseudo >>> + REGNO which was assigned to a hard register. >>> + POTENTIAL_RELOAD_HARD_REGS contains hard registers which might be >>> + used for reloads since the EBB end. It is an approximation of the >>> + used hard registers in the split range. The exact value would >>> + require expensive calculations. If we were aggressive with >>> + splitting because of the approximation, the split pseudo will save >>> + the same hard register assignment and will be removed in the undo >>> + pass. We still need the approximation because too aggressive >>> + splitting would result in too inaccurate cost calculation in the >>> + assignment pass because of too many generated moves which will be >>> + probably removed in the undo pass. */ >>> +static inline bool >>> +need_for_split_p (HARD_REG_SET potential_reload_hard_regs, int regno) >>> +{ >>> + int hard_regno = regno < FIRST_PSEUDO_REGISTER ? regno : >>> reg_renumber[regno]; >>> + >>> + lra_assert (hard_regno >= 0); >>> + return ((TEST_HARD_REG_BIT (potential_reload_hard_regs, hard_regno) >>> + && ! TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno) >>> + && (usage_insns[regno].reloads_num >>> + + (regno < FIRST_PSEUDO_REGISTER ? 0 : 2) < reloads_num) >>> + && ((regno < FIRST_PSEUDO_REGISTER >>> + && ! bitmap_bit_p (&ebb_global_regs, regno)) >>> + || (regno >= FIRST_PSEUDO_REGISTER >>> + && lra_reg_info[regno].nrefs > 3 >>> + && bitmap_bit_p (&ebb_global_regs, regno)))) >>> + || (regno >= FIRST_PSEUDO_REGISTER && need_for_call_save_p (regno))); >>> +} >> Could you add more commentary about the thinking behind this particular >> choice of heuristic? E.g. I wasn't sure what the reloads_num check did, >> or why we only split hard registers that are local to the EBB and only >> split pseudos that aren't. >> >> The 2 and 3 numbers seemed a bit magic too. I suppose the 2 has >> something to do with "one save and one restore", but I wasn't sure >> why we applied it only for pseudos. (AIUI that arm of the check >> deals with "genuine" split pseudos rather than call saves & restores.) >> >> Still, it says a lot for the high quality of LRA that, out of all the >> 1000s of lines of code I've read so far, this is the only part that >> didn't seem to have an intuitive justification. > Yes, right. I checked many parameters and finally picked out the above > ones. What I found is that aggressive and even moderate splitting > usually result in worse code. Most splitting is undone and if we do > splitting aggressively we generate a lot of insns which will be removed > but their costs are taken into account during assignment pass. > Inheritance has quite bigger chance to successful. One reason why > splitting is undone is that spilling + inheritance of short living > pseudos is a substitution of splitting in some way. Therefore I do > splitting for long living pseudos. That means non local pseudos. > > I added the following comments. > > return ((TEST_HARD_REG_BIT (potential_reload_hard_regs, hard_regno) > && ! TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno) > /* We need at least 2 reloads to make pseudo splitting > profitable. We should provide hard regno splitting in > any case to solve 1st insn scheduling problem when > moving hard register definition up might result in > impossibility to find hard register for reload pseudo of > small register class. */ > && (usage_insns[regno].reloads_num > + (regno < FIRST_PSEUDO_REGISTER ? 0 : 2) < reloads_num) > && (regno < FIRST_PSEUDO_REGISTER > /* For short living pseudos, spilling + inheritance can > be considered a substitution for splitting. > Therefore we do not splitting for local pseudos. It > decreases also aggressiveness of splitting. The > minimal number of references is chosen taking into > account that for 2 references splitting has no sense > as we can just spill the pseudo. */ > || (regno >= FIRST_PSEUDO_REGISTER > && lra_reg_info[regno].nrefs > 3 > && bitmap_bit_p (&ebb_global_regs, regno)))) > > I removed checking that hard_regno is local. It was a parameter to > decrease aggressiveness of hard register splitting but I realized that > it contradicts the reason for splitting to solve 1st insn scheduling > problem because regional scheduling can move hard register definition > beyond BB. Thanks, this looks much better to me FWIW. Richard