On Wed, 27 Jan 2021, Jakub Jelinek wrote:

> On Wed, Jan 27, 2021 at 04:16:22PM +0100, Richard Biener wrote:
> > I can check but all immediate first uses of mem_addr are in
> > true_dependece_1 which does x_addr = get_addr (x_addr); as the
> > first thing on it.  So the concern would be that
> > get_addr (get_addr (x_addr)) != get_addr (x_addr) which I think
> > shouldn't be the case (no semantic difference at least).
> 
> I guess you're right, indeed I can't find other uses of mem_addr
> and it isn't saved, but just used in that one function, though
> perhaps against more than one addr.
> And in get_addr we most likely get into the:
> 
>   if (GET_CODE (x) != VALUE)
>     {
>       if ((GET_CODE (x) == PLUS || GET_CODE (x) == MINUS)
>           && GET_CODE (XEXP (x, 0)) == VALUE
>           && CONST_SCALAR_INT_P (XEXP (x, 1)))
>         {
>           rtx op0 = get_addr (XEXP (x, 0));
>           if (op0 != XEXP (x, 0))
>             {
>               poly_int64 c;
>               if (GET_CODE (x) == PLUS
>                   && poly_int_rtx_p (XEXP (x, 1), &c))
>                 return plus_constant (GET_MODE (x), op0, c);
>               return simplify_gen_binary (GET_CODE (x), GET_MODE (x),
>                                           op0, XEXP (x, 1));
>             }
>         }
>       return x;
>     }
> 
> and if get_addr already is the VALUE it should be then it will not create
> any further PLUS/MINUS.
> 
> One question is how often we do that
>   if (maybe_ne (offset, 0))
>     mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset);
> and with your patch also
>   mem_addr = get_addr (mem_addr);
> if there are no active local stores (or just no canon_true_dependence
> calls for that read).
> If it would happen often, we could further optimize compile time and memory
> by changing mem_addr above this plus_constant into mem_addr_base,
> add mem_addr var initially set to NULL and before each of those 3
> canon_true_dependence calls do:
>   if (mem_addr == NULL_RTX)
>     {
>       mem_addr = mem_addr_base;
>       if (maybe_ne (offset, 0))
>       mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset);
>       mem_addr = get_addr (mem_addr);
>     }

Sure, more micro-optimizing is possible, including passing a flag
to canon_true_dependence whether the addr RTX already had get_addr
called on it.  And pass in the offset as poly-rtx-int and make
get_addr apply it if not zero.  But I've mostly tried to address
the non-linearity here, after the patch the number of get_addr
and plus_constant calls should be linear in the number of loads
rather than O (#loads * #stores).

I've also tried to find the most minimalistic change at this point
(so it could be eventually backported).

Richard.

Reply via email to