On Wed, Jul 22, 2020 at 12:03 PM Andrea Corallo <andrea.cora...@arm.com> wrote: > > Hi all, > > I'd like to submit the following two patches implementing a new AArch64 > specific back-end pass that helps optimize branch-dense code, which can > be a bottleneck for performance on some Arm cores. This is achieved by > padding out the branch-dense sections of the instruction stream with > nops. > > The original patch was already posted some time ago: > > https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg200721.html > > This follows up splitting as suggested in two patches, rebasing on > master and implementing the suggestions of the first code review. > > This first patch implements the addition of a new RTX instruction class > FILLER_INSN, which has been white listed to allow placement of NOPs > outside of a basic block. This is to allow padding after unconditional > branches. This is favorable so that any performance gained from > diluting branches is not paid straight back via excessive eating of > nops. > > It was deemed that a new RTX class was less invasive than modifying > behavior in regards to standard UNSPEC nops. > > 1/2 is requirement for 2/2. Please see this the cover letter of this last > for more details on the pass itself.
I wonder if such effect of instructions on the pipeline can be modeled in the DFA and thus whether the scheduler could issue (always ready) NOPs? I also wonder whether such optimization is better suited for the assembler which should know instruction lengths and alignment in a more precise way and also would know whether extra nops make immediates too large for pc relative things like short branches or section anchor accesses (or whatever else)? Richard. > Regards > > Andrea > > gcc/ChangeLog > > 2020-07-17 Andrea Corallo <andrea.cora...@arm.com> > Carey Williams <carey.willi...@arm.com> > > * cfgbuild.c (inside_basic_block_p): Handle FILLER_INSN. > * cfgrtl.c (rtl_verify_bb_layout): Whitelist FILLER_INSN outside > basic blocks. > * coretypes.h: New rtx class. > * emit-rtl.c (emit_filler_after): New function. > * rtl.def (FILLER_INSN): New rtl define. > * rtl.h (rtx_filler_insn): Define new structure. > (FILLER_INSN_P): New macro. > (is_a_helper <rtx_filler_insn *>::test): New test helper for > rtx_filler_insn. > (emit_filler_after): New extern. > * target-insns.def: Add target insn definition.