https://llvm.org/bugs/show_bug.cgi?id=24620

            Bug ID: 24620
           Summary: BranchProbabilities::scale is very hot function but
                    it's assembly is very inefficient.
           Product: libraries
           Version: trunk
          Hardware: HP
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: Support Libraries
          Assignee: unassignedb...@nondot.org
          Reporter: cmt...@google.com
                CC: llvm-bugs@lists.llvm.org
    Classification: Unclassified

Created attachment 14791
  --> https://llvm.org/bugs/attachment.cgi?id=14791&action=edit
gzip'd .ii file

While recently examining a performance problem in clang (8x slower than GCC,
see https://llvm.org/bugs/show_bug.cgi?id=24618), we looked at the results of
running 'perf' on clang and saw that in this case the hottest function was
llvm::BranchProbabilities::scale (20.69% of the entire compilation was being
spent in this function).

Looking more closely at the function's assembly, annotated with perf results we
saw:

  0.08 │      xor    %edx,%edx
  0.15 │      imul   %rax,%rdi
  2.51 │      shr    $0x20,%rcx
  0.00 │      imul   %rax,%rcx
  0.93 │      mov    %rdi,%rsi
  0.45 │      mov    %rcx,%rax
  0.86 │      shr    $0x20,%rsi
  0.69 │      shr    $0x20,%rax
  1.01 │      add    %esi,%ecx
  0.41 │      mov    $0xffffffffffffffff,%rsi
  0.26 │      setb   %dl
  0.55 │      add    %edx,%eax
  0.85 │      cmp    %eax,%r8d
       │    ↓ ja     50
       │49:   mov    %rsi,%rax
  1.33 │    ← retq
       │      nop
  0.93 │50:   shl    $0x20,%rax
  0.33 │      mov    %ecx,%ecx
       │      xor    %edx,%edx
  0.05 │      or     %rcx,%rax
  1.00 │      mov    $0xffffffff,%r9d
  0.27 │      div    %r8
 32.45 │      cmp    %r9,%rax
  1.14 │      mov    %rax,%rcx
  0.74 │    ↑ ja     49
  0.98 │      mov    %rdx,%rax
  0.08 │      mov    %edi,%edi
  0.03 │      xor    %edx,%edx
  0.40 │      shl    $0x20,%rax
  0.94 │      shl    $0x20,%rcx
  0.03 │      or     %rdi,%rax
  0.50 │      div    %r8
 43.53 │      add    %rcx,%rax
  1.25 │      cmovae %rax,%rsi
  2.61 │    ↑ jmp    49


It appears that nearly 75% of the time in this function is being spent on the
two 'div' ops. This assembly is very inefficient.. the two div's ought to be
done together, thus possibly halving the time spent in this function.

(This is on intel x86_64, BTW, in case it's not obvious from the assembly).

This is with ToT Clang/LLVM, but with:

$ cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=/tmp/llvm-install.opt
-DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=On  <path-to-llvm>
$ make all
$ make install

Attached is a gzip'd version of the .ii file we used.  The clang command to
compile this file is:

/usr/local/google2/cmtice/llvm-work/llvm-install.opt/bin/clang++  -c   
-fno-exceptions -Wno-multichar -m64 -Wa,--noexecstack -fPIC
-no-canonical-prefixes  -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector
-D__STDC_FORMAT_MACROS -D__STDC_CONSTANT_MACROS -DANDROID -fmessage-length=0 -W
-Wall -Wno-unused     -Winit-self -Wpointer-arith -g -fno-strict-aliasing
-DNDEBUG -UDEBUG           -D__compiler_offsetof=__builtin_offsetof
-Werror=int-conversion -Wno-reserved-id-macro -Wno-format-pedantic
-Wno-unused-command-line-argument   -target x86_64-linux-gnu   -DANDROID
-fmessage-length=0 -W -Wall -Wno-unused -Winit-self -Wpointer-arith
-Wsign-promo -DNDEBUG -UDEBUG  -Wno-inconsistent-missing-override   -target
x86_64-linux-gnu  -DBUILDING_LIBART=1 -Wthread-safety -Wthread-safety-negative
-Wimplicit-fallthrough -Wfloat-equal -Wint-to-void-pointer-cast
-Wused-but-marked-unused -Wdeprecated -Wunreachable-code-break
-Wunreachable-code-return -Wmissing-noreturn -fno-omit-frame-pointer -fno-rtti
-std=gnu++11 -ggdb3 -Wall -Werror -Wextra -Wstrict-aliasing -fstrict-aliasing
-Wunreachable-code -Wredundant-decls -Wshadow -Wunused -fvisibility=protected
-DART_DEFAULT_GC_TYPE_IS_CMS -DIMT_SIZE=64 -DART_BASE_ADDRESS=0x60000000
-DART_DEFAULT_INSTRUCTION_SET_FEATURES=default
-DART_BASE_ADDRESS_MIN_DELTA=-0x1000000 -DART_BASE_ADDRESS_MAX_DELTA=0x1000000
-DART_DEFAULT_INSTRUCTION_SET_FEATURES="default" -O3 -Wframe-larger-than=2700
-fPIC -D_USING_LIBCXX -std=gnu++14 -nostdinc++  -Werror=int-to-pointer-cast
-Werror=pointer-to-int-cast  -Werror=address-of-temporary
-Werror=null-dereference -Werror=return-type -o interpreter_goto_table_impl.o
./interpreter_goto_table_impl.ii

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to