On Mon, Jul 31, 2023 at 03:53:26PM +0200, Richard Biener wrote:
[snip]
> > The main difference in the compilation output about code around the 
> > miss-prediction
> > branch is:
> >   o In O2: predicated instruction (cmov here) is selected to eliminate above
> >     branch. cmov is true better than branch here.
> >   o In O3/PGO: bitout() is inlined into encode_file(), and branch 
> > instruction
> >     is selected. But this branch is obviously *unpredictable* and the 
> > compiler
> >     doesn't know it. This why O3/PGO are are so bad for this program.
> >
> > Gcc doesn't support __builtin_unpredictable() which has been introduced by 
> > llvm.
> > Then I tried to see if __builtin_expect_with_probability(e,x, 0.5) can 
> > serve the
> > same purpose. The result is negative.
> 
> But does it appear to be predictable with your profiling data?
>
I profiled the branch-misses event on a kabylake machine. 99% of the
mis-prediction blames to encode_file() function.

$ sudo perf record -e branch-instructions:pp,branch-misses:pp -c 1000 -- 
taskset -c 0 ./huffman.O3 test.data

Samples: 197K of event 'branch-misses:pp', Event count (approx.): 197618000
Overhead  Command     Shared Object     Symbol
  99.58%  huffman.O3  huffman.O3        [.] encode_file
   0.12%  huffman.O3  [kernel.vmlinux]  [k] __x86_indirect_thunk_array
   0.11%  huffman.O3  libc-2.31.so      [.] _IO_getc
   0.01%  huffman.O3  [kernel.vmlinux]  [k] common_file_perm

Then annotate encode_file() function:

Samples: 197K of event 'branch-misses:pp', 1000 Hz, Event count (approx.): 
197618000
encode_file  /work/myWork/linux/pgo/huffman.O3 [Percent: local period]
Percent│     ↑ je      38
       │     bitout():
       │     current_byte <<= 1;
       │ 70:   add     %edi,%edi
       │     if (b == '1') current_byte |= 1;
 48.70 │    ┌──cmp     $0x31,%dl
 47.11 │    ├──jne     7a
       │    │  or      $0x1,%edi
       │    │nbits++;  
       │ 7a:└─→inc     %eax
       │     if (b == '1') current_byte |= 1;
       │       mov     %edi,current_byte
       │     nbits++;
       │       mov     %eax,nbits
       │     if (nbits == 8) {
  1.16 │       cmp     $0x8,%eax
  3.03 │     ↓ je      a0
       │     encode_file():
       │     for (s=codes[ch]; *s; s++) bitout (outfile, *s);
       │       movzbl  0x1(%r13),%edx
       │       inc     %r13
       │       test    %dl,%dl
       │     ↑ jne     70
       │     ↑ jmp     38
       │       nop

-- 
Cheers,
Changbin Du

Reply via email to