On Wed, 27 Jan 2021, Jakub Jelinek wrote: > On Wed, Jan 27, 2021 at 04:16:22PM +0100, Richard Biener wrote: > > I can check but all immediate first uses of mem_addr are in > > true_dependece_1 which does x_addr = get_addr (x_addr); as the > > first thing on it. So the concern would be that > > get_addr (get_addr (x_addr)) != get_addr (x_addr) which I think > > shouldn't be the case (no semantic difference at least). > > I guess you're right, indeed I can't find other uses of mem_addr > and it isn't saved, but just used in that one function, though > perhaps against more than one addr. > And in get_addr we most likely get into the: > > if (GET_CODE (x) != VALUE) > { > if ((GET_CODE (x) == PLUS || GET_CODE (x) == MINUS) > && GET_CODE (XEXP (x, 0)) == VALUE > && CONST_SCALAR_INT_P (XEXP (x, 1))) > { > rtx op0 = get_addr (XEXP (x, 0)); > if (op0 != XEXP (x, 0)) > { > poly_int64 c; > if (GET_CODE (x) == PLUS > && poly_int_rtx_p (XEXP (x, 1), &c)) > return plus_constant (GET_MODE (x), op0, c); > return simplify_gen_binary (GET_CODE (x), GET_MODE (x), > op0, XEXP (x, 1)); > } > } > return x; > } > > and if get_addr already is the VALUE it should be then it will not create > any further PLUS/MINUS. > > One question is how often we do that > if (maybe_ne (offset, 0)) > mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset); > and with your patch also > mem_addr = get_addr (mem_addr); > if there are no active local stores (or just no canon_true_dependence > calls for that read). > If it would happen often, we could further optimize compile time and memory > by changing mem_addr above this plus_constant into mem_addr_base, > add mem_addr var initially set to NULL and before each of those 3 > canon_true_dependence calls do: > if (mem_addr == NULL_RTX) > { > mem_addr = mem_addr_base; > if (maybe_ne (offset, 0)) > mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset); > mem_addr = get_addr (mem_addr); > }
Sure, more micro-optimizing is possible, including passing a flag to canon_true_dependence whether the addr RTX already had get_addr called on it. And pass in the offset as poly-rtx-int and make get_addr apply it if not zero. But I've mostly tried to address the non-linearity here, after the patch the number of get_addr and plus_constant calls should be linear in the number of loads rather than O (#loads * #stores). I've also tried to find the most minimalistic change at this point (so it could be eventually backported). Richard.