craig.topper added a comment.

In D70157#1755927 <https://reviews.llvm.org/D70157#1755927>, @jyknight wrote:

> Thanks for the comments, they help a little. But it's still somewhat 
> confusing, so let me write down what seems to be happening:
>
> - Before emitting every instruction, a new MCMachineDependentFragment is now 
> emitted, of one of the multiple types:
>   - For most instructions, that'll be BranchPrefix.
>   - For things that need branch-alignment, it'll be BranchPadding, unless 
> there's a fused conditional before, in which case it's BranchSplit
>   - For fused conditionals, it'll be FusedJccPadding.
> - After emitting an instruction that needs branch-alignment, all of those 
> previously-emitted MCMachineDependentFragment are updated to point to the 
> branch's fragment.
> - Thus, every MCDataFragment now only contains a single instruction (this 
> property is depended upon for getInstSize, at least).
>
>   All the MCMachineDependentFragments in a region bounded by a branch at the 
> end and either a branch or a fragment-type which is not type in {FT_Data, 
> FT_MachineDependent, FT_Relaxable, FT_CompactEncodedInst} at the beginning, 
> will reference the ending branch instruction's fragment.
>
>   Then, when it comes time to do relaxation, every one of those 
> machine-dependent-fragments has the opportunity to grow its instruction a 
> little bit. The first instruction in a "block" will grow up to 5 segment 
> prefixes (via modifying the BranchPrefix fragment), and then if more is 
> needed, more prefixes will be added to the next instruction, and so on. Until 
> you run out of instructions in the region. At which point the BranchPadding 
> or FusedJccPadding types (right before the branch/fused-branch) will be able 
> to emit nops to achieve the desired alignment.
>
>   An alternative would be to simply emit NOPs before branches as needed. That 
> would be substantially simpler, since it would only require special handling 
> for a branch or a fused-branch. I assume things were done this 
> substantially-more-complex way in order to reduce performance cost of 
> inserting NOP instructions? Are there numbers for how much better it is to 
> use segment prefixes, vs a separate nop instruction? It seems a little bit 
> surprising to me that it would be that important, but I don't know...


I don't have any numbers myself. I was only involved in some of the code review 
internally. My understanding is that NOP instructions would place extra nop 
uops into the DSB(the decoded uop buffer) and that limits the performance that 
can be recovered. By using redundant prefixes no extra uops are generated and 
more performance is recovered.

> I'll note that the method here has the semantic issue of making it 
> effectively impossible to ever evaluate an expression like ".if . - symbol == 
> 24" (assuming we're emitting instructions), since every instruction can now 
> change size. I suspect that will make it impossible to turn this on by 
> default without breaking a lot of assembly code. Previously, only certain 
> instructions, like branches or arithmetic ops with constant arguments of 
> unknown value, could change size.




CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to