Hi all, I've copied you as you have each made some significant change to a function in LRA which I guess makes you de-facto experts.
I've spent a while researching the history of simplify_operand_subreg and in particular the behaviour for subregs of memory. For my sake if no-one else here is a rundown of its evolution; corrections welcome. (r192719 git:c6a6cda) The original code identified a few special cases where a subreg could be trivially eliminated. Otherwise it introduced a reload for the inner expression in expectation of the new subreg(reload-reg) to be handled as part of operand reloading or get eliminated on the next iteration (r198344 git:ea99c7a) A special case for an LRA introduced subreg was added (LRA_SUBREG_P) that should always be considered valid. This I believe is to cope with cases where a there are operands required to match but with different modes and, presumably, one of the modes is not actually allowed. Not 100% sure what this is though! (r203169 git:c533414) A special case for a paradoxical subreg (wider than a word) was added to handle the case where the outer mode requires more registers than are available in the allocno class. There is a seemingly pointless call to PUT_MODE in this which from what I can see will set the same mode as the reload register was just created with: "PUT_MODE (new_reg, mode);" (r207007 git:9c8190e) A special case to ensure a subreg(pseudo-reg) will get spilled to avoid looping in LRA. It now starts to get interesting. (r211715 git:1a68e83) Only simplify a subreg of memory if the new address is valid or that the original address wasn't valid. (r220297 git:1aae95e) Introduces 'innermode' local which as far as I can tell is going to be identical to the reg_mode argument to simplify_operand_subreg. References to reg_mode are not fixed as part of this. This is a significant source of confusion in the code. Also simplifies subreg of CONSTANT_P expressions. (r233107 git:401bd0c) Also allow elimination of a subreg of memory when the address component of the new MEM can be reloaded. Note that at this point we are specifically allowing new MEMs to exist with invalid addresses that will be fixed up later but this is OK as they will get reloaded. (r239342 git:2d2b78a) Introduces MEM subreg simplification if the inner mode access would have been slow anyway (a check that gets fixed in d7671d7). Also introduces a double reload to satisfy some very specific rs6000 backend requirements. This case requires an outer-mode reload but is implemented as a double reload rather than just doing the outer mode reload and letting the next iteration take care of the following step. It is not clear whether this change actually fixes the same reported issues in two different ways due to the change on the condition for simplifying the MEM subreg and the double reload trick. This change also affects almost all MEM subreg reloads as previously any MEM subreg that was not simplified would have used the common innermode reload logic later in the function; see code guarded by: || CONSTANT_P (reg) || GET_CODE (reg) == PLUS || MEM_P (reg)) but now many, if not all, use the double reload logic which is not always necessary. I believe this change needs partly reverting and redoing as a special case to insert an outermode reload for a MEM subreg if the goal class for the operand has no register suitable for inner mode, then wait for the following iteration to deal with the MEM subreg simplification. (Just a theory for now.) (r242554 git:d7671d7) Fixes the new case of MEM subreg simplification from (r239342 git:2d2b78a) (r243782 git:856bd6f) This is another case of multiple changes where some were not critical and overall there is a dangerous one here I believe. The primary aim of this change is to reload the address before reloading the inner subreg. This appears to be a dormant bug since day1 as the original logic would have failed when reloading an inner mem if its address was not already valid. The potentially fatal part of this change is the introduction of a "return false" in the MEM subreg simplification which is placed immediately after restoring the original subreg expression. I believe that if control ever actually reaches this statement then LRA would infinite loop as the MEM subreg would never be simplified. With the additional cases handled by virtue of (401bd0c) then I believe there are very few reasons why control would reach this point but presumably a MEM subreg on an operand with a special memory constraint is one such case that would be fatal. So what are the next steps! 1) [BUG] Add an exclusion for WORD_REGISTER_OPERATIONS because MIPS is currently broken by the existing code. PR78660 2) [BUG] Remove the return false introduced in (r243782 git:856bd6f). 3) [CLEANUP] Remove reg_mode argument and replace all uses of reg_mode with innermode. Rename 'reg' to 'inner' and 'operand' to 'outer' and 'mode' to 'outermode'. 4) [OPTIMISATION] Change double-reload logic so that it just deals with the special outermode reload without adjusting the subreg. 5) [??] Determine if big endian still needs a special case like in reload? Comments anyone? In an attempt to make a minimal change I propose the following as it allows WORD_REGISTER_OPERATIONS targets to benefit from the invalid address reloading fix. I think the check would be more appropriately placed on the outer-most if (MEM_P (reg)) but this would affect the handling of many more subregs which seems too dangerous at this point in release. diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index 393cc69..771475a 100644 --- a/gcc/lra-constraints.c +++ b/gcc/lra-constraints.c @@ -1512,10 +1512,11 @@ simplify_operand_subreg (int nop, machine_mode reg_mode) equivalences in function lra_constraints) and because for spilled pseudos we allocate stack memory enough for the biggest corresponding paradoxical subreg. */ - if (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) - && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) - || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) - && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg)))) + if (!WORD_REGISTER_OPERATIONS + && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) + && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) + || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) + && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg))))) return true; *curr_id->operand_loc[nop] = operand; The change will affect at least arc,mips,rx,sh,sparc though I haven't checked which of these default on for LRA just that they can turn on LRA. I'll post this as a patch with appropriate updates to comments unless anyone raises some issues. Thanks, Matthew