skan marked an inline comment as done. skan added a comment. In D70157#1755927 <https://reviews.llvm.org/D70157#1755927>, @jyknight wrote:
> Thanks for the comments, they help a little. But it's still somewhat > confusing, so let me write down what seems to be happening: > > - Before emitting every instruction, a new MCMachineDependentFragment is now > emitted, of one of the multiple types: > - For most instructions, that'll be BranchPrefix. > - For things that need branch-alignment, it'll be BranchPadding, unless > there's a fused conditional before, in which case it's BranchSplit > - For fused conditionals, it'll be FusedJccPadding. > - After emitting an instruction that needs branch-alignment, all of those > previously-emitted MCMachineDependentFragment are updated to point to the > branch's fragment. > - Thus, every MCDataFragment now only contains a single instruction (this > property is depended upon for getInstSize, at least). > > All the MCMachineDependentFragments in a region bounded by a branch at the > end and either a branch or a fragment-type which is not type in {FT_Data, > FT_MachineDependent, FT_Relaxable, FT_CompactEncodedInst} at the beginning, > will reference the ending branch instruction's fragment. > > Then, when it comes time to do relaxation, every one of those > machine-dependent-fragments has the opportunity to grow its instruction a > little bit. The first instruction in a "block" will grow up to 5 segment > prefixes (via modifying the BranchPrefix fragment), and then if more is > needed, more prefixes will be added to the next instruction, and so on. Until > you run out of instructions in the region. At which point the BranchPadding > or FusedJccPadding types (right before the branch/fused-branch) will be able > to emit nops to achieve the desired alignment. > > An alternative would be to simply emit NOPs before branches as needed. That > would be substantially simpler, since it would only require special handling > for a branch or a fused-branch. I assume things were done this > substantially-more-complex way in order to reduce performance cost of > inserting NOP instructions? Are there numbers for how much better it is to > use segment prefixes, vs a separate nop instruction? It seems a little bit > surprising to me that it would be that important, but I don't know... > > I'll note that the method here has the semantic issue of making it > effectively impossible to ever evaluate an expression like ".if . - symbol == > 24" (assuming we're emitting instructions), since every instruction can now > change size. I suspect that will make it impossible to turn this on by > default without breaking a lot of assembly code. Previously, only certain > instructions, like branches or arithmetic ops with constant arguments of > unknown value, could change size. Thanks for your detailed and accurate analysis for my code! I am sorry that this should have been done by me. ================ Comment at: llvm/lib/MC/MCAssembler.cpp:1058 +/// is the address of the symbol, which would casuse performance regression. +void MCAssembler::moveSymbol(const MCFragment *Src, MCFragment *Dst) const { + if (!(Src && Dst && Dst->getKind() == MCFragment::FT_MachineDependent)) ---------------- jyknight wrote: > I don't think this is necessary. AFAICT, the symbols should already be in the > right place -- pointing to the relax fragment, not the instruction itself, > without this. And removing all this moveSymbol/updateSymbolMap code doesn't > make any tests fail. Yes, I check it and you are right. I will removing all this moveSymbol/updateSymbolMap code. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D70157/new/ https://reviews.llvm.org/D70157 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits