[PATCH] D70157: Align branches within 32-Byte boundary

Kan Shengchen via Phabricator via cfe-commits Thu, 21 Nov 2019 20:34:05 -0800

skan marked an inline comment as done.
skan added a comment.

In D70157#1755927 <https://reviews.llvm.org/D70157#1755927>, @jyknight wrote:


> Thanks for the comments, they help a little. But it's still somewhat 
> confusing, so let me write down what seems to be happening:
>
> - Before emitting every instruction, a new MCMachineDependentFragment is now 
> emitted, of one of the multiple types:
>   - For most instructions, that'll be BranchPrefix.
>   - For things that need branch-alignment, it'll be BranchPadding, unless 
> there's a fused conditional before, in which case it's BranchSplit
>   - For fused conditionals, it'll be FusedJccPadding.
> - After emitting an instruction that needs branch-alignment, all of those 
> previously-emitted MCMachineDependentFragment are updated to point to the 
> branch's fragment.
> - Thus, every MCDataFragment now only contains a single instruction (this 
> property is depended upon for getInstSize, at least).
>
>   All the MCMachineDependentFragments in a region bounded by a branch at the 
> end and either a branch or a fragment-type which is not type in {FT_Data, 
> FT_MachineDependent, FT_Relaxable, FT_CompactEncodedInst} at the beginning, 
> will reference the ending branch instruction's fragment.
>
>   Then, when it comes time to do relaxation, every one of those 
> machine-dependent-fragments has the opportunity to grow its instruction a 
> little bit. The first instruction in a "block" will grow up to 5 segment 
> prefixes (via modifying the BranchPrefix fragment), and then if more is 
> needed, more prefixes will be added to the next instruction, and so on. Until 
> you run out of instructions in the region. At which point the BranchPadding 
> or FusedJccPadding types (right before the branch/fused-branch) will be able 
> to emit nops to achieve the desired alignment.
>
>   An alternative would be to simply emit NOPs before branches as needed. That 
> would be substantially simpler, since it would only require special handling 
> for a branch or a fused-branch. I assume things were done this 
> substantially-more-complex way in order to reduce performance cost of 
> inserting NOP instructions? Are there numbers for how much better it is to 
> use segment prefixes, vs a separate nop instruction? It seems a little bit 
> surprising to me that it would be that important, but I don't know...
>
>   I'll note that the method here has the semantic issue of making it 
> effectively impossible to ever evaluate an expression like ".if . - symbol == 
> 24" (assuming we're emitting instructions), since every instruction can now 
> change size. I suspect that will make it impossible to turn this on by 
> default without breaking a lot of assembly code. Previously, only certain 
> instructions, like branches or arithmetic ops with constant arguments of 
> unknown value, could change size.


Thanks for your detailed and accurate analysis for my code! I am sorry that 
this should have been done by me.



================
Comment at: llvm/lib/MC/MCAssembler.cpp:1058
+/// is the address of the symbol, which would casuse performance regression.
+void MCAssembler::moveSymbol(const MCFragment *Src, MCFragment *Dst) const {
+  if (!(Src && Dst && Dst->getKind() == MCFragment::FT_MachineDependent))
----------------
jyknight wrote:
> I don't think this is necessary. AFAICT, the symbols should already be in the 
> right place -- pointing to the relax fragment, not the instruction itself, 
> without this. And removing all this moveSymbol/updateSymbolMap code doesn't 
> make any tests fail.
Yes, I check it and you are right. I will removing all this 
moveSymbol/updateSymbolMap code.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D70157: Align branches within 32-Byte boundary

Reply via email to