On 26.01.2016 22:18, Richard Sandiford wrote: > [cc-ing Eric as RTL maintainer] > > Matthew Fortune <matthew.fort...@imgtec.com> writes: >> Bernd Edlinger <bernd.edlin...@hotmail.de> writes: >>> Matthew Fortune <matthew.fort...@imgtec.com> writes: >>>> Has the patch been tested beyond just building GCC? I can do a >>>> test run for you if you don't have things set up to do one yourself. >>> >>> I built a cross-gcc with all languages and a cross-glibc, but I have >>> not set up an emulation environment, so if you could give it a test >>> that would be highly welcome. >> >> mipsel-linux-gnu test results are the same before and after this patch. >> >> Please go ahead and commit. > > I still object to this. And it feels like the patch was posted > as though it was a new one in order to avoid answering the objections > that were raised when it was last posted: > > https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02218.html >
Richard, I am really sorry when you feel now like I did not take your objections seriously. Let me first explain what happened from my point of view: When I posted this response to your objections here: https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00235.html I was waiting for your response, but nothing happened, so I kind of forgot about the issue. In the meantime Ubuntu and Debian began to roll out GCC-6 and they got stuck at the same issue, but instead of waiting for pr69012 to be eventually resolved they created a duplicate pr69129, and then last Thursday Nick applied my initial patch without my intervention, probably because of the pressure they put on us. That changed significantly how I looked at the issue after that point, as there was no actual regression anymore, the revised patch still made sense, but for another reason. When you look at the 20+ targets in the gcc tree you'll see that almost all of them have a frame-layout computation function and all except mips have a shortcut "if (reload_completed) return;" in that function. And OTOH mips has one of the most complicated frame layout functions of all targets. For all of these reasons I posted a new patch which tries to resolve differences between mips and other targets inital_elimination_offset functions. I still think that it is not OK for the mips target to do the frame layout already in mips_frame_pointer_required because the frame layout will change several times, until reload is completed, and that function is only called right in the beginning. And I think that it is not really well-designed to have a frame layout function F(x,y)->z unless you assume F to be a trivial O(1) function that has no significant computation overhead. When you assume F to be an expensive O(N) or higher complexity you would design the interface like compute_frame_layout_now() and get_cached_initial_elimination_offset(x,y)->z. Also note, that with the current design, of initial_elimination_offset the frame layout is already computed several times, because reload has to call the function with different parameters, and there is no way how to know when the cached value can be used and when not. I do also think that having to cache the initial elimination offset values in the back-end that are already cached in the target does not improve anything. Then I would prefer to have a set of new target callbacks that makes it easy to access the cached target values. If you want to propose a patch for that I will not object, but I doubt it will be suitable for Stage 4. Thanks Bernd. > IMO the problem is that rtx_addr_can_trap_p_1 duplicates a large > bit of LRA/reload logic: > > ------------------------------------------------------------------------ > /* Compute an approximation for the offset between the register > FROM and TO for the current function, as it was at the start > of the routine. */ > > static HOST_WIDE_INT > get_initial_register_offset (int from, int to) > { > #ifdef ELIMINABLE_REGS > static const struct elim_table_t > { > const int from; > const int to; > } table[] = ELIMINABLE_REGS; > HOST_WIDE_INT offset1, offset2; > unsigned int i, j; > > if (to == from) > return 0; > > /* It is not safe to call INITIAL_ELIMINATION_OFFSET > before the reload pass. We need to give at least > an estimation for the resulting frame size. */ > if (! reload_completed) > { > offset1 = crtl->outgoing_args_size + get_frame_size (); > #if !STACK_GROWS_DOWNWARD > offset1 = - offset1; > #endif > if (to == STACK_POINTER_REGNUM) > return offset1; > else if (from == STACK_POINTER_REGNUM) > return - offset1; > else > return 0; > } > > for (i = 0; i < ARRAY_SIZE (table); i++) > if (table[i].from == from) > { > if (table[i].to == to) > { > INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to, > offset1); > return offset1; > } > for (j = 0; j < ARRAY_SIZE (table); j++) > { > if (table[j].to == to > && table[j].from == table[i].to) > { > INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to, > offset1); > INITIAL_ELIMINATION_OFFSET (table[j].from, table[j].to, > offset2); > return offset1 + offset2; > } > if (table[j].from == to > && table[j].to == table[i].to) > { > INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to, > offset1); > INITIAL_ELIMINATION_OFFSET (table[j].from, table[j].to, > offset2); > return offset1 - offset2; > } > } > } > else if (table[i].to == from) > { > if (table[i].from == to) > { > INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to, > offset1); > return - offset1; > } > for (j = 0; j < ARRAY_SIZE (table); j++) > { > if (table[j].to == to > && table[j].from == table[i].from) > { > INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to, > offset1); > INITIAL_ELIMINATION_OFFSET (table[j].from, table[j].to, > offset2); > return - offset1 + offset2; > } > if (table[j].from == to > && table[j].to == table[i].from) > { > INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to, > offset1); > INITIAL_ELIMINATION_OFFSET (table[j].from, table[j].to, > offset2); > return - offset1 - offset2; > } > } > } > > /* If the requested register combination was not found, > try a different more simple combination. */ > if (from == ARG_POINTER_REGNUM) > return get_initial_register_offset (HARD_FRAME_POINTER_REGNUM, to); > else if (to == ARG_POINTER_REGNUM) > return get_initial_register_offset (from, HARD_FRAME_POINTER_REGNUM); > else if (from == HARD_FRAME_POINTER_REGNUM) > return get_initial_register_offset (FRAME_POINTER_REGNUM, to); > else if (to == HARD_FRAME_POINTER_REGNUM) > return get_initial_register_offset (from, FRAME_POINTER_REGNUM); > else > return 0; > > #else > HOST_WIDE_INT offset; > > if (to == from) > return 0; > > if (reload_completed) > { > INITIAL_FRAME_POINTER_OFFSET (offset); > } > else > { > offset = crtl->outgoing_args_size + get_frame_size (); > #if !STACK_GROWS_DOWNWARD > offset = - offset; > #endif > } > > if (to == STACK_POINTER_REGNUM) > return offset; > else if (from == STACK_POINTER_REGNUM) > return - offset; > else > return 0; > > #endif > } > ------------------------------------------------------------------------ > > Under the current interface macros like INITIAL_ELIMINATION_OFFSET > are expected to trigger the calculation of the target's frame layout > (since you need that information to answer the question). > To me it seems wrong that we're attempting to call that sort of > macro in a query routine like rtx_addr_can_trap_p_1. > > The frame layout is fixed for each LRA/reload cycle. The layout > for the last of those cycles carries forward through the rest > of compilation. > > IMO we should cache the information we need at the start of each > LRA/reload cycle. This will be more robust, because there will > be no accidental changes to global state either during or after > LRA/reload. It will also be more efficient because > rtx_addr_can_trap_p_1 can read the cached variables rather > than calling back into the target. > > Thanks, > Richard >