> > > On 12/12/2021 3:49 AM, Jan Hubicka via Gcc-patches wrote: > > Hi, > > As discussed in the PR, we miss some optimization becuase > > gimple-ssa-isolate-paths turns NULL memory accesses to volatile and adds > > __builtin_trap after them. This is seen as a side-effect by IPA analysis > > and additionally the (fully unreachable) builtin_trap is believed to load > > all global memory. > > > > I think we should think of less intrusive gimple representation of this, but > > it is also easy enough to special case that in IPA analysers as done in > > this patch. This is a win even if we improve the representation since > > gimple-ssa-isolate-paths is run late and this way we improve optimization > > early. > So what's important about the IR representation is that we keep some kind of > memory access around (so that it will fault), that the memory access > reference a minimal amount of other stuff (we don't want the address > computation to keep anything live for example) and that we have a subsequent > trap so that the CFG routines know it traps. > > Otherwise we're free to do whatever we want.
I was thinking about __builtin_trap_load (ptr) and __builtin_trap_store (ptr) which we could annotate correctly for PTA and avoid need for two statements, but I am not sure if this is good idea. coioncidentally I just ran across another artifact of this. One of hottest functions in clang in getTerminator. Clang builds it as: │ 000000000122f4b0 <llvm::BasicBlock::getTerminator() const>: │ llvm::BasicBlock::getTerminator() const: 2.16 │ mov 0x28(%rdi),%rax 28.85 │ add $0x28,%rdi 0.20 │ cmp %rax,%rdi 3.55 │ ↓ je 29 │ lea -0x18(%rax),%rcx 0.79 │ test %rax,%rax │ cmove %rax,%rcx 4.15 │ movzbl 0x10(%rcx),%edx 48.46 │ add $0xffffffe5,%edx 3.95 │ xor %eax,%eax 0.20 │ cmp $0xb,%edx 3.35 │ cmovb %rcx,%rax 4.33 │ ← ret │29: xor %eax,%eax │ ← ret While we do: │ │ 0000000001471840 <llvm::BasicBlock::getTerminator() const>: │ llvm::BasicBlock::getTerminator() const: 3.24 │ mov 0x28(%rdi),%rax 31.31 │ add $0x28,%rdi 0.59 │ cmp %rdi,%rax 5.31 │ ↓ je 30 │ test %rax,%rax 2.36 │ → je 9366f4 <llvm::BasicBlock::getTerminator() const [clone .cold]> │ movzbl -0x8(%rax),%edx 45.73 │ sub $0x18,%rax │ sub $0x1b,%edx 4.70 │ cmp $0xb,%edx 3.53 │ mov $0x0,%edx │ cmovae %rdx,%rax 3.23 │ ← ret │ xchg %ax,%ax │30: xor %eax,%eax │ ← ret .... │ │ 00000000009366f4 <llvm::BasicBlock::getTerminator() const [clone .cold]>: │ llvm::BasicBlock::getTerminator() const [clone .cold]: │ movzbl 0x10,%eax │ ud2 The clang generated code obviously conditionally loads NULL pointer, but replacing this cmov by conditional jump to cold section seems overkill. Loading 10 into eax seems useless... I think this is common pattern in C++ code originating from cast with multiple inheritance. I would vote towards optimizing out the conditial move in this case and I think it is correct. Honza