Re: [julia-users] Is FMA/Muladd Working Here?

Chris Rackauckas Thu, 22 Sep 2016 16:46:36 -0700

So, in the end, is `@fastmath` supposed to be adding FMA? Should I open an 
issue?


On Wednesday, September 21, 2016 at 7:11:14 PM UTC-7, Yichao Yu wrote:
>
> On Wed, Sep 21, 2016 at 9:49 PM, Erik Schnetter <schn...@gmail.com 
> <javascript:>> wrote: 
> > I confirm that I can't get Julia to synthesize a `vfmadd` instruction 
> > either... Sorry for sending you on a wild goose chase. 
>
> -march=haswell does the trick for C (both clang and gcc) 
> the necessary bit for the machine ir optimization (this is not a llvm 
> ir optimization pass) to do this is llc options -mcpu=haswell and 
> function attribute unsafe-fp-math=true. 
>
> > 
> > -erik 
> > 
> > On Wed, Sep 21, 2016 at 9:33 PM, Yichao Yu <yyc...@gmail.com 
> <javascript:>> wrote: 
> >> 
> >> On Wed, Sep 21, 2016 at 9:29 PM, Erik Schnetter <schn...@gmail.com 
> <javascript:>> 
> >> wrote: 
> >> > On Wed, Sep 21, 2016 at 9:22 PM, Chris Rackauckas <rack...@gmail.com 
> <javascript:>> 
> >> > wrote: 
> >> >> 
> >> >> I'm not seeing `@fastmath` apply fma/muladd. I rebuilt the sysimg 
> and 
> >> >> now 
> >> >> I get results where g and h apply muladd/fma in the native code, but 
> a 
> >> >> new 
> >> >> function k which is `@fastmath` inside of f does not apply 
> muladd/fma. 
> >> >> 
> >> >> 
> >> >> 
> https://gist.github.com/ChrisRackauckas/b239e33b4b52bcc28f3922c673a25910 
> >> >> 
> >> >> Should I open an issue? 
> >> > 
> >> > 
> >> > In your case, LLVM apparently thinks that `x + x + 3` is faster to 
> >> > calculate 
> >> > than `2x+3`. If you use a less round number than `2` multiplying `x`, 
> >> > you 
> >> > might see a different behaviour. 
> >> 
> >> I've personally never seen llvm create fma from mul and add. We might 
> >> not have the llvm passes enabled if LLVM is capable of doing this at 
> >> all. 
> >> 
> >> > 
> >> > -erik 
> >> > 
> >> > 
> >> >> Note that this is on v0.6 Windows. On Linux the sysimg isn't 
> rebuilding 
> >> >> for some reason, so I may need to just build from source. 
> >> >> 
> >> >> On Wednesday, September 21, 2016 at 6:22:06 AM UTC-7, Erik Schnetter 
> >> >> wrote: 
> >> >>> 
> >> >>> On Wed, Sep 21, 2016 at 1:56 AM, Chris Rackauckas <
> rack...@gmail.com> 
> >> >>> wrote: 
> >> >>>> 
> >> >>>> Hi, 
> >> >>>>   First of all, does LLVM essentially fma or muladd expressions 
> like 
> >> >>>> `a1*x1 + a2*x2 + a3*x3 + a4*x4`? Or is it required that one 
> >> >>>> explicitly use 
> >> >>>> `muladd` and `fma` on these types of instructions (is there a 
> macro 
> >> >>>> for 
> >> >>>> making this easier)? 
> >> >>> 
> >> >>> 
> >> >>> Yes, LLVM will use fma machine instructions -- but only if they 
> lead 
> >> >>> to 
> >> >>> the same round-off error as using separate multiply and add 
> >> >>> instructions. If 
> >> >>> you do not care about the details of conforming to the IEEE 
> standard, 
> >> >>> then 
> >> >>> you can use the `@fastmath` macro that enables several 
> optimizations, 
> >> >>> including this one. This is described in the manual 
> >> >>> 
> >> >>> <
> http://docs.julialang.org/en/release-0.5/manual/performance-tips/#performance-annotations>.
>  
>
> >> >>> 
> >> >>> 
> >> >>>>   Secondly, I am wondering if my setup is no applying these 
> >> >>>> operations 
> >> >>>> correctly. Here's my test code: 
> >> >>>> 
> >> >>>> f(x) = 2.0x + 3.0 
> >> >>>> g(x) = muladd(x,2.0, 3.0) 
> >> >>>> h(x) = fma(x,2.0, 3.0) 
> >> >>>> 
> >> >>>> @code_llvm f(4.0) 
> >> >>>> @code_llvm g(4.0) 
> >> >>>> @code_llvm h(4.0) 
> >> >>>> 
> >> >>>> @code_native f(4.0) 
> >> >>>> @code_native g(4.0) 
> >> >>>> @code_native h(4.0) 
> >> >>>> 
> >> >>>> Computer 1 
> >> >>>> 
> >> >>>> Julia Version 0.5.0-rc4+0 
> >> >>>> Commit 9c76c3e* (2016-09-09 01:43 UTC) 
> >> >>>> Platform Info: 
> >> >>>>   System: Linux (x86_64-redhat-linux) 
> >> >>>>   CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz 
> >> >>>>   WORD_SIZE: 64 
> >> >>>>   BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Haswell) 
> >> >>>>   LAPACK: libopenblasp.so.0 
> >> >>>>   LIBM: libopenlibm 
> >> >>>>   LLVM: libLLVM-3.7.1 (ORCJIT, broadwell) 
> >> >>> 
> >> >>> 
> >> >>> This looks good, the "broadwell" architecture that LLVM uses should 
> >> >>> imply 
> >> >>> the respective optimizations. Try with `@fastmath`. 
> >> >>> 
> >> >>> -erik 
> >> >>> 
> >> >>> 
> >> >>> 
> >> >>> 
> >> >>>> 
> >> >>>> (the COPR nightly on CentOS7) with 
> >> >>>> 
> >> >>>> [crackauc@crackauc2 ~]$ lscpu 
> >> >>>> Architecture:          x86_64 
> >> >>>> CPU op-mode(s):        32-bit, 64-bit 
> >> >>>> Byte Order:            Little Endian 
> >> >>>> CPU(s):                16 
> >> >>>> On-line CPU(s) list:   0-15 
> >> >>>> Thread(s) per core:    1 
> >> >>>> Core(s) per socket:    8 
> >> >>>> Socket(s):             2 
> >> >>>> NUMA node(s):          2 
> >> >>>> Vendor ID:             GenuineIntel 
> >> >>>> CPU family:            6 
> >> >>>> Model:                 79 
> >> >>>> Model name:            Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz 
> >> >>>> Stepping:              1 
> >> >>>> CPU MHz:               1200.000 
> >> >>>> BogoMIPS:              6392.58 
> >> >>>> Virtualization:        VT-x 
> >> >>>> L1d cache:             32K 
> >> >>>> L1i cache:             32K 
> >> >>>> L2 cache:              256K 
> >> >>>> L3 cache:              25600K 
> >> >>>> NUMA node0 CPU(s):     0-7 
> >> >>>> NUMA node1 CPU(s):     8-15 
> >> >>>> 
> >> >>>> 
> >> >>>> 
> >> >>>> I get the output 
> >> >>>> 
> >> >>>> define double @julia_f_72025(double) #0 { 
> >> >>>> top: 
> >> >>>>   %1 = fmul double %0, 2.000000e+00 
> >> >>>>   %2 = fadd double %1, 3.000000e+00 
> >> >>>>   ret double %2 
> >> >>>> } 
> >> >>>> 
> >> >>>> define double @julia_g_72027(double) #0 { 
> >> >>>> top: 
> >> >>>>   %1 = call double @llvm.fmuladd.f64(double %0, double 
> 2.000000e+00, 
> >> >>>> double 3.000000e+00) 
> >> >>>>   ret double %1 
> >> >>>> } 
> >> >>>> 
> >> >>>> define double @julia_h_72029(double) #0 { 
> >> >>>> top: 
> >> >>>>   %1 = call double @llvm.fma.f64(double %0, double 2.000000e+00, 
> >> >>>> double 
> >> >>>> 3.000000e+00) 
> >> >>>>   ret double %1 
> >> >>>> } 
> >> >>>> .text 
> >> >>>> Filename: fmatest.jl 
> >> >>>> pushq %rbp 
> >> >>>> movq %rsp, %rbp 
> >> >>>> Source line: 1 
> >> >>>> addsd %xmm0, %xmm0 
> >> >>>> movabsq $139916162906520, %rax  # imm = 0x7F40C5303998 
> >> >>>> addsd (%rax), %xmm0 
> >> >>>> popq %rbp 
> >> >>>> retq 
> >> >>>> nopl (%rax,%rax) 
> >> >>>> .text 
> >> >>>> Filename: fmatest.jl 
> >> >>>> pushq %rbp 
> >> >>>> movq %rsp, %rbp 
> >> >>>> Source line: 2 
> >> >>>> addsd %xmm0, %xmm0 
> >> >>>> movabsq $139916162906648, %rax  # imm = 0x7F40C5303A18 
> >> >>>> addsd (%rax), %xmm0 
> >> >>>> popq %rbp 
> >> >>>> retq 
> >> >>>> nopl (%rax,%rax) 
> >> >>>> .text 
> >> >>>> Filename: fmatest.jl 
> >> >>>> pushq %rbp 
> >> >>>> movq %rsp, %rbp 
> >> >>>> movabsq $139916162906776, %rax  # imm = 0x7F40C5303A98 
> >> >>>> Source line: 3 
> >> >>>> movsd (%rax), %xmm1           # xmm1 = mem[0],zero 
> >> >>>> movabsq $139916162906784, %rax  # imm = 0x7F40C5303AA0 
> >> >>>> movsd (%rax), %xmm2           # xmm2 = mem[0],zero 
> >> >>>> movabsq $139925776008800, %rax  # imm = 0x7F43022C8660 
> >> >>>> popq %rbp 
> >> >>>> jmpq *%rax 
> >> >>>> nopl (%rax) 
> >> >>>> 
> >> >>>> It looks like explicit muladd or not ends up at the same native 
> code, 
> >> >>>> but is that native code actually doing an fma? The fma native is 
> >> >>>> different, 
> >> >>>> but from a discussion on the Gitter it seems that might be a 
> software 
> >> >>>> FMA? 
> >> >>>> This computer is setup with the BIOS setting as LAPACK optimized 
> or 
> >> >>>> something like that, so is that messing with something? 
> >> >>>> 
> >> >>>> Computer 2 
> >> >>>> 
> >> >>>> Julia Version 0.6.0-dev.557 
> >> >>>> Commit c7a4897 (2016-09-08 17:50 UTC) 
> >> >>>> Platform Info: 
> >> >>>>   System: NT (x86_64-w64-mingw32) 
> >> >>>>   CPU: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz 
> >> >>>>   WORD_SIZE: 64 
> >> >>>>   BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell) 
> >> >>>>   LAPACK: libopenblas64_ 
> >> >>>>   LIBM: libopenlibm 
> >> >>>>   LLVM: libLLVM-3.7.1 (ORCJIT, haswell) 
> >> >>>> 
> >> >>>> 
> >> >>>> on a 4770k i7, Windows 10, I get the output 
> >> >>>> 
> >> >>>> ; Function Attrs: uwtable 
> >> >>>> define double @julia_f_66153(double) #0 { 
> >> >>>> top: 
> >> >>>>   %1 = fmul double %0, 2.000000e+00 
> >> >>>>   %2 = fadd double %1, 3.000000e+00 
> >> >>>>   ret double %2 
> >> >>>> } 
> >> >>>> 
> >> >>>> ; Function Attrs: uwtable 
> >> >>>> define double @julia_g_66157(double) #0 { 
> >> >>>> top: 
> >> >>>>   %1 = call double @llvm.fmuladd.f64(double %0, double 
> 2.000000e+00, 
> >> >>>> double 3.000000e+00) 
> >> >>>>   ret double %1 
> >> >>>> } 
> >> >>>> 
> >> >>>> ; Function Attrs: uwtable 
> >> >>>> define double @julia_h_66158(double) #0 { 
> >> >>>> top: 
> >> >>>>   %1 = call double @llvm.fma.f64(double %0, double 2.000000e+00, 
> >> >>>> double 
> >> >>>> 3.000000e+00) 
> >> >>>>   ret double %1 
> >> >>>> } 
> >> >>>> .text 
> >> >>>> Filename: console 
> >> >>>> pushq %rbp 
> >> >>>> movq %rsp, %rbp 
> >> >>>> Source line: 1 
> >> >>>> addsd %xmm0, %xmm0 
> >> >>>> movabsq $534749456, %rax        # imm = 0x1FDFA110 
> >> >>>> addsd (%rax), %xmm0 
> >> >>>> popq %rbp 
> >> >>>> retq 
> >> >>>> nopl (%rax,%rax) 
> >> >>>> .text 
> >> >>>> Filename: console 
> >> >>>> pushq %rbp 
> >> >>>> movq %rsp, %rbp 
> >> >>>> Source line: 2 
> >> >>>> addsd %xmm0, %xmm0 
> >> >>>> movabsq $534749584, %rax        # imm = 0x1FDFA190 
> >> >>>> addsd (%rax), %xmm0 
> >> >>>> popq %rbp 
> >> >>>> retq 
> >> >>>> nopl (%rax,%rax) 
> >> >>>> .text 
> >> >>>> Filename: console 
> >> >>>> pushq %rbp 
> >> >>>> movq %rsp, %rbp 
> >> >>>> movabsq $534749712, %rax        # imm = 0x1FDFA210 
> >> >>>> Source line: 3 
> >> >>>> movsd dcabs164_(%rax), %xmm1  # xmm1 = mem[0],zero 
> >> >>>> movabsq $534749720, %rax        # imm = 0x1FDFA218 
> >> >>>> movsd (%rax), %xmm2           # xmm2 = mem[0],zero 
> >> >>>> movabsq $fma, %rax 
> >> >>>> popq %rbp 
> >> >>>> jmpq *%rax 
> >> >>>> nop 
> >> >>>> 
> >> >>>> This seems to be similar to the first result. 
> >> >>>> 
> >> >>> 
> >> >>> 
> >> >>> 
> >> >>> -- 
> >> >>> Erik Schnetter <schn...@gmail.com> 
> >> >>> http://www.perimeterinstitute.ca/personal/eschnetter/ 
> >> > 
> >> > 
> >> > 
> >> > 
> >> > -- 
> >> > Erik Schnetter <schn...@gmail.com <javascript:>> 
> >> > http://www.perimeterinstitute.ca/personal/eschnetter/ 
> > 
> > 
> > 
> > 
> > -- 
> > Erik Schnetter <schn...@gmail.com <javascript:>> 
> > http://www.perimeterinstitute.ca/personal/eschnetter/ 
>

Re: [julia-users] Is FMA/Muladd Working Here?

Reply via email to