On Fri, Sep 22, 2017 at 10:58 AM, Andrew Pinski <pins...@gmail.com> wrote: > Two overall comments: > * What about splitting register_offset into two different elements, > one for non 128bit modes and one for 128bit (and more; OI, etc.) modes > so you get better address generation right away for the simd load > cases rather than having LRA/reload having to reload the address into > a register.
I'm not sure if changing register_offset cost would make a difference, since costs are usually used during optimization, not during address generation. This is something that I didn't think to try though. I can try taking a look at this. I did try writing a patch to modify predicates to disallow reg offset for 128bit modes, and that got complicated, as I had to split apart a number of patterns in the aarch64-simd.md file that accept both VD and VQ modes. I ended up with a patch 3-4 times as big as the one I submitted, without any additional performance improvement, so it wasn't worth the trouble. > * Maybe adding a testcase to the testsuite to show this change. Yes, I can add a testcase. > One extra comment: > * should we change the generic tuning to avoid reg+reg for 128bit modes? Are there other targets with a similar problem? I only know that it is a problem for Falkor. It might be a loss for some targets as it is replacing one instruction with two. Jim