On Mon, Jul 17, 2017 at 10:51:21AM -0600, Sean McAllister wrote: > When generating code for a simple inner loop (instantiated with > std::complex<float>) > > template <typename cx> > void __attribute__((noinline)) benchcore(const cx* __restrict__ aa, > const cx* __restrict__ bb, const cx* __restrict__ cc, cx* __restrict__ > dd, cx uu, cx vv, size_t nn) { > for (ssize_t ii=0; ii < nn; ii++) { > dd[ii] = ( > aa[ii]*uu + > bb[ii]*vv + > cc[ii] > ); > } > } > > g++ generates the following assembly code (g++ 7.1.0) (compiled with: > g++ -I. test.cc -O3 -ggdb3 -o test)
[snipped] > > The interesting part is the two calls to __mulsc3, which the docs > indicate computes complex multiplication according to Annex G of the > C99 standard. This leads me to two questions. > > First, disassembling __mulsc3 doesn't seem to contain anything: > > (gdb) disassemble __mulsc3 > Dump of assembler code for function __mulsc3@plt: > 0x0000000000400aa0 <+0>: jmpq *0x2035d2(%rip) # 0x604078 > 0x0000000000400aa6 <+6>: pushq $0xc > 0x0000000000400aab <+11>: jmpq 0x4009d0 > End of assembler dump. > > What's the cause of this? That you are disassembling the PLT (note __mulsc3@plt), which redirects to the real function which is provided by libgcc (on my computer the exact location is /lib/x86_64-linux-gnu/libgcc_s.so.1). > > Second, since I don't think I'll convince anyone to generate > non-standard conforming code by default, could the default performance > of complex multiplication be enhanced significantly by performing the > isnan() checks required by Annex G and only calling the function to > fix the results if they fail? That would move the function call > overhead out of the critical path at least. Gabriel