[PATCH] D70157: Align branches within 32-Byte boundary

James Y Knight via Phabricator via cfe-commits Thu, 21 Nov 2019 15:16:15 -0800

jyknight added a comment.

Thanks for the comments, they help a little. But it's still somewhat confusing, 
so let me write down what seems to be happening:


- Before emitting every instruction, a new MCMachineDependentFragment is now 
emitted, of one of the multiple types:
  - For most instructions, that'll be BranchPrefix.
  - For things that need branch-alignment, it'll be BranchPadding, unless 
there's a fused conditional before, in which case it's BranchSplit
  - For fused conditionals, it'll be FusedJccPadding.
- After emitting an instruction that needs branch-alignment, all of those 
previously-emitted MCMachineDependentFragment are updated to point to the 
branch's fragment.
- Thus, every MCDataFragment now only contains a single instruction (this 
property is depended upon for getInstSize, at least).

All the MCMachineDependentFragments in a region bounded by a branch at the end 
and either a branch or a fragment-type which is not type in {FT_Data, 
FT_MachineDependent, FT_Relaxable, FT_CompactEncodedInst} at the beginning, 
will reference the ending branch instruction's fragment.

Then, when it comes time to do relaxation, every one of those 
machine-dependent-fragments has the opportunity to grow its instruction a 
little bit. The first instruction in a "block" will grow up to 5 segment 
prefixes (via modifying the BranchPrefix fragment), and then if more is needed, 
more prefixes will be added to the next instruction, and so on. Until you run 
out of instructions in the region. At which point the BranchPadding or 
FusedJccPadding types (right before the branch/fused-branch) will be able to 
emit nops to achieve the desired alignment.

An alternative would be to simply emit NOPs before branches as needed. That 
would be substantially simpler, since it would only require special handling 
for a branch or a fused-branch. I assume things were done this 
substantially-more-complex way in order to reduce performance cost of inserting 
NOP instructions? Are there numbers for how much better it is to use segment 
prefixes, vs a separate nop instruction? It seems a little bit surprising to me 
that it would be that important, but I don't know...

I'll note that the method here has the semantic issue of making it effectively 
impossible to ever evaluate an expression like ".if . - symbol == 24" (assuming 
we're emitting instructions), since every instruction can now change size. I 
suspect that will make it impossible to turn this on by default without 
breaking a lot of assembly code. Previously, only certain instructions, like 
branches or arithmetic ops with constant arguments of unknown value, could 
change size.



================
Comment at: llvm/lib/MC/MCAssembler.cpp:1058
+/// is the address of the symbol, which would casuse performance regression.
+void MCAssembler::moveSymbol(const MCFragment *Src, MCFragment *Dst) const {
+  if (!(Src && Dst && Dst->getKind() == MCFragment::FT_MachineDependent))
----------------
I don't think this is necessary. AFAICT, the symbols should already be in the 
right place -- pointing to the relax fragment, not the instruction itself, 
without this. And removing all this moveSymbol/updateSymbolMap code doesn't 
make any tests fail.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157



_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D70157: Align branches within 32-Byte boundary

Reply via email to