On Mon, Jul 31, 2023 at 03:53:26PM +0200, Richard Biener wrote: [snip] > > The main difference in the compilation output about code around the > > miss-prediction > > branch is: > > o In O2: predicated instruction (cmov here) is selected to eliminate above > > branch. cmov is true better than branch here. > > o In O3/PGO: bitout() is inlined into encode_file(), and branch > > instruction > > is selected. But this branch is obviously *unpredictable* and the > > compiler > > doesn't know it. This why O3/PGO are are so bad for this program. > > > > Gcc doesn't support __builtin_unpredictable() which has been introduced by > > llvm. > > Then I tried to see if __builtin_expect_with_probability(e,x, 0.5) can > > serve the > > same purpose. The result is negative. > > But does it appear to be predictable with your profiling data? > I profiled the branch-misses event on a kabylake machine. 99% of the mis-prediction blames to encode_file() function.
$ sudo perf record -e branch-instructions:pp,branch-misses:pp -c 1000 -- taskset -c 0 ./huffman.O3 test.data Samples: 197K of event 'branch-misses:pp', Event count (approx.): 197618000 Overhead Command Shared Object Symbol 99.58% huffman.O3 huffman.O3 [.] encode_file 0.12% huffman.O3 [kernel.vmlinux] [k] __x86_indirect_thunk_array 0.11% huffman.O3 libc-2.31.so [.] _IO_getc 0.01% huffman.O3 [kernel.vmlinux] [k] common_file_perm Then annotate encode_file() function: Samples: 197K of event 'branch-misses:pp', 1000 Hz, Event count (approx.): 197618000 encode_file /work/myWork/linux/pgo/huffman.O3 [Percent: local period] Percent│ ↑ je 38 │ bitout(): │ current_byte <<= 1; │ 70: add %edi,%edi │ if (b == '1') current_byte |= 1; 48.70 │ ┌──cmp $0x31,%dl 47.11 │ ├──jne 7a │ │ or $0x1,%edi │ │nbits++; │ 7a:└─→inc %eax │ if (b == '1') current_byte |= 1; │ mov %edi,current_byte │ nbits++; │ mov %eax,nbits │ if (nbits == 8) { 1.16 │ cmp $0x8,%eax 3.03 │ ↓ je a0 │ encode_file(): │ for (s=codes[ch]; *s; s++) bitout (outfile, *s); │ movzbl 0x1(%r13),%edx │ inc %r13 │ test %dl,%dl │ ↑ jne 70 │ ↑ jmp 38 │ nop -- Cheers, Changbin Du