[PATCH] D70157: Align branches within 32-Byte boundary

James Y Knight via Phabricator via cfe-commits Tue, 26 Nov 2019 08:08:54 -0800

jyknight added a comment.

I have some other concerns about the code itself, but after pondering this a 
little bit, I'd like to instead bring up some rather more general concerns 
about the overall approach used here -- with some suggestions. (As these 
comments are not really comments on the _code_, but on the overall strategy, 
they also apply to the gnu binutils patch for this same feature.)


Of course, echoing chandler, it would be nice to see some performance numbers, 
otherwise it's not clear how useful any of this is.

Segment-prefix padding
----------------------

The ability to pad instructions instead of adding a multibyte-NOP in order to 
create alignment seems like a generally-useful feature, which should be usable 
in other situations where we align within a function -- assuming that there is 
indeed a measurable performance benefit vs NOP instructions. E.g. we should do 
this for basic-block alignment, as well! As such, it feels like this feature 
ought to be split out, and implemented separately from the new branch-alignment 
functionality -- in a way which is usable for any kind of alignment request.

The way I'd imagine it working is to introduce a new pair of asm directives, to 
enable and disable segment-prefix padding in a certain range of instructions 
(let's say ".enable_instr_prefix_pad", ".disable_instr_prefix_pad". Within such 
an enabled range, the assembler would prefix instructions as required in order 
to handle later nominally-nop-emitting code alignment directives (such as the 
usual '.p2align 4, 0x90') .

Branch alignment
----------------

The primary goal of this patch, restricting the placement of branch 
instructions, is a performance optimization. Similar to loop alignment, the 
desire is to increase speed, at the cost of code-size. However, the way this 
feature has been designed is a global assembler flag. I find that not ideal, 
because it cannot take into account hotness of a block/function, as for example 
loop alignment code does. Basic-block alignment of loops is explicitly opt-in 
on an block-by-block basis -- the compiler simply emits a p2align directive 
where it needs, and the assembler honors that. And so, 
MachineBlockPlacement::alignBlocks has a bunch of conditions under which it 
will avoid emitting a p2align. This seems like a good model -- the assembler 
does what it's told by the compiler (or assembly-writer). Making the 
branch-instruction-alignment work similarly seems like it would be good.

IMO it would be nicest if there could be a directive that requests to 
specially-align the next instruction. However, the fused-jcc case makes that 
quite tricky, so perhaps this ought to also be a mode which can be 
enabled/disabled on a region as well.

Enabling by default
-------------------

Previously, I'd mentioned that it seemed likely we couldn't actually enable 
branch-alignment by default because it'll probably break people's inline-asm 
and standalone asm files. That would be solved by making everything 
controllable within the asm file. The compiler could then insert the directives 
for its own code, and disable it around inline assembly. And standalone asm 
files could remain unaffected, unless they opt in. With that, we could actually 
enable the alignment by default, for compiled output in certain cpu-tuning 
modes, if it's warranted.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D70157: Align branches within 32-Byte boundary

Reply via email to