Ah OK, thank you, I wasn't aware of that particular mechanism. If I run the code and break on __mulsc3 it disassembles as I'd expect.
On Mon, Jul 17, 2017 at 12:32 PM, Gabriel Paubert <paub...@iram.es> wrote: > On Mon, Jul 17, 2017 at 10:51:21AM -0600, Sean McAllister wrote: >> When generating code for a simple inner loop (instantiated with >> std::complex<float>) >> >> template <typename cx> >> void __attribute__((noinline)) benchcore(const cx* __restrict__ aa, >> const cx* __restrict__ bb, const cx* __restrict__ cc, cx* __restrict__ >> dd, cx uu, cx vv, size_t nn) { >> for (ssize_t ii=0; ii < nn; ii++) { >> dd[ii] = ( >> aa[ii]*uu + >> bb[ii]*vv + >> cc[ii] >> ); >> } >> } >> >> g++ generates the following assembly code (g++ 7.1.0) (compiled with: >> g++ -I. test.cc -O3 -ggdb3 -o test) > > [snipped] >> >> The interesting part is the two calls to __mulsc3, which the docs >> indicate computes complex multiplication according to Annex G of the >> C99 standard. This leads me to two questions. >> >> First, disassembling __mulsc3 doesn't seem to contain anything: >> >> (gdb) disassemble __mulsc3 >> Dump of assembler code for function __mulsc3@plt: >> 0x0000000000400aa0 <+0>: jmpq *0x2035d2(%rip) # 0x604078 >> 0x0000000000400aa6 <+6>: pushq $0xc >> 0x0000000000400aab <+11>: jmpq 0x4009d0 >> End of assembler dump. >> >> What's the cause of this? > > That you are disassembling the PLT (note __mulsc3@plt), which redirects > to the real function which is provided by libgcc (on my computer the > exact location is /lib/x86_64-linux-gnu/libgcc_s.so.1). > >> >> Second, since I don't think I'll convince anyone to generate >> non-standard conforming code by default, could the default performance >> of complex multiplication be enhanced significantly by performing the >> isnan() checks required by Annex G and only calling the function to >> fix the results if they fail? That would move the function call >> overhead out of the critical path at least. > > Gabriel