manosanag added a comment.

Hello Dave,

thanks for replying.

Yes, this is an optimization.

On some AArch64 cores, including Ampere's ampere1 architecture that this is 
targeted for, load/store pair instructions are faster compared to simple 
loads/stores only when the alignment of the pair is at least twice that of the 
individual element being loaded. Based on the performance of various 
benchmarks, emitting ldp/stp instructions was disabled on GCC at some point 
(discussion is 
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615672.html). This patch 
improves on that and offers control over when the instructions are used.

Similar patch with the same flags has been recently submitted for review in the 
GCC mailing lists 
(https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628590.html).

I have a fix ready for the fortran regressions shown by autotesting. I can 
include some of this information to the commit message of the diff.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159480/new/

https://reviews.llvm.org/D159480

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to