https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048
--- Comment #21 from Jiong Wang <jiwang at gcc dot gnu.org> --- (In reply to Richard Henderson from comment #19) > (In reply to Jiong Wang from comment #16) > > But there is a performance issue as described at > > > > https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00281.html > > > > "this patch forces register scaling expression out of memory ref, so that > > RTL CSE pass can handle common register scaling expressions" > > > > This is particularly performance critial if a group of instructions are > > using the same "scaled register" inside hot loop. CSE can reduce redundant > > calculations. > > I wish that message had been a bit more complete with the description > of the performance issue. I must guess from this... Please check the documentation at http://infocenter.arm.com/help/topic/com.arm.doc.uan0015b/Cortex_A57_Software_Optimization_Guide_external.pdf, page 14, the line describe "Load register, register offset, scale by 2". > I'll note for the record that you cannot hope to solve this with > the legitimize_address hook alone for the simple reason that it's not > called for legitimate addresses, of which (base + index * 2) is > a member. The hook is only being called for illegitimate addresses. Agreed, while double check the code, for the performance related "scale by 2" situation, aarch64 backend has already made it a illegitimate address. There is the following check in aarch64_classify_address, the "GET_MODE_SIZE (mode) != 16" is catching that. bool allow_reg_index_p = !load_store_pair_p && (GET_MODE_SIZE (mode) != 16 || aarch64_vector_mode_supported_p (mode)) && !aarch64_vect_struct_mode_p (mode); So if the address is something like (base for short + index * 2), then it will go through the aarch64_legitimize_address. Thus I think your second patch at #c10 with my minor modification still is the proper fix for current stage.