https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048

--- Comment #21 from Jiong Wang <jiwang at gcc dot gnu.org> ---
(In reply to Richard Henderson from comment #19)
> (In reply to Jiong Wang from comment #16)
> > But there is a performance issue as described at
> >
> >   https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00281.html
> >
> >   "this patch forces register scaling expression out of memory ref, so that
> >    RTL CSE pass can handle common register scaling expressions"
> >
> > This is particularly performance critial if a group of instructions are
> > using the same "scaled register" inside hot loop. CSE can reduce redundant
> > calculations.
>
> I wish that message had been a bit more complete with the description
> of the performance issue.  I must guess from this...

Please check the documentation at
http://infocenter.arm.com/help/topic/com.arm.doc.uan0015b/Cortex_A57_Software_Optimization_Guide_external.pdf,
page 14, the line describe "Load register, register offset, scale by 2".

> I'll note for the record that you cannot hope to solve this with
> the legitimize_address hook alone for the simple reason that it's not
> called for legitimate addresses, of which (base + index * 2) is
> a member.  The hook is only being called for illegitimate addresses.

Agreed, while double check the code, for the performance related "scale by 2"
situation, aarch64 backend has already made it a illegitimate address.

There is the following check in aarch64_classify_address, the "GET_MODE_SIZE
(mode) != 16" is catching that.

  bool allow_reg_index_p =
    !load_store_pair_p
    && (GET_MODE_SIZE (mode) != 16 || aarch64_vector_mode_supported_p (mode))
    && !aarch64_vect_struct_mode_p (mode);

So if the address is something like (base for short + index * 2), then it will
go through the aarch64_legitimize_address. Thus I think your second patch at
#c10 with my minor modification still is the proper fix for current stage.

Reply via email to