> 
> 
> On 12/12/2021 3:49 AM, Jan Hubicka via Gcc-patches wrote:
> > Hi,
> > As discussed in the PR, we miss some optimization becuase
> > gimple-ssa-isolate-paths turns NULL memory accesses to volatile and adds
> > __builtin_trap after them.  This is seen as a side-effect by IPA analysis
> > and additionally the (fully unreachable) builtin_trap is believed to load
> > all global memory.
> > 
> > I think we should think of less intrusive gimple representation of this, but
> > it is also easy enough to special case that in IPA analysers as done in
> > this patch.  This is a win even if we improve the representation since
> > gimple-ssa-isolate-paths is run late and this way we improve optimization
> > early.
> So what's important about the IR representation is that we keep some kind of
> memory access around (so that it will fault), that the memory access
> reference a minimal amount of other stuff (we don't want the address
> computation to keep anything live for example) and that we have a subsequent
> trap so that the CFG routines know it traps.
> 
> Otherwise we're free to do whatever we want.

I was thinking about __builtin_trap_load (ptr) and __builtin_trap_store (ptr)
which we could annotate correctly for PTA and avoid need for two
statements, but I am not sure if this is good idea.

coioncidentally I just ran across another artifact of this.   One of
hottest functions in clang in getTerminator. Clang builds it as:

       │    000000000122f4b0 <llvm::BasicBlock::getTerminator() const>:
       │    llvm::BasicBlock::getTerminator() const:
  2.16 │      mov    0x28(%rdi),%rax
 28.85 │      add    $0x28,%rdi
  0.20 │      cmp    %rax,%rdi
  3.55 │    ↓ je     29
       │      lea    -0x18(%rax),%rcx
  0.79 │      test   %rax,%rax
       │      cmove  %rax,%rcx
  4.15 │      movzbl 0x10(%rcx),%edx
 48.46 │      add    $0xffffffe5,%edx
  3.95 │      xor    %eax,%eax
  0.20 │      cmp    $0xb,%edx
  3.35 │      cmovb  %rcx,%rax
  4.33 │    ← ret
       │29:   xor    %eax,%eax
       │    ← ret

While we do:

       │
       │    0000000001471840 <llvm::BasicBlock::getTerminator() const>:
       │    llvm::BasicBlock::getTerminator() const:
  3.24 │      mov    0x28(%rdi),%rax
 31.31 │      add    $0x28,%rdi
  0.59 │      cmp    %rdi,%rax
  5.31 │    ↓ je     30
       │      test   %rax,%rax
  2.36 │    → je     9366f4 <llvm::BasicBlock::getTerminator() const [clone 
.cold]>
       │      movzbl -0x8(%rax),%edx
 45.73 │      sub    $0x18,%rax
       │      sub    $0x1b,%edx
  4.70 │      cmp    $0xb,%edx
  3.53 │      mov    $0x0,%edx
       │      cmovae %rdx,%rax
  3.23 │    ← ret
       │      xchg   %ax,%ax
       │30:   xor    %eax,%eax
       │    ← ret

....

       │
       │   00000000009366f4 <llvm::BasicBlock::getTerminator() const [clone 
.cold]>:
       │   llvm::BasicBlock::getTerminator() const [clone .cold]:
       │     movzbl 0x10,%eax
       │     ud2

The clang generated code obviously conditionally loads NULL pointer, but
replacing this cmov by conditional jump to cold section seems overkill.
Loading 10 into eax seems useless...

I think this is common pattern in C++ code originating from cast with
multiple inheritance. I would vote towards optimizing out the conditial
move in this case and I think it is correct.

Honza

Reply via email to