On Mon, Apr 30, 2018 at 7:41 PM, Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> wrote: > Hi all, > > We can improve the performance of complex floating-point multiplications by > inlining the expansion a bit more aggressively. > We can inline complex x = a * b as: > x = (ar*br - ai*bi) + i(ar*bi + br*ai); > if (isunordered (__real__ x, __imag__ x)) > x = __muldc3 (a, b); //Or __mulsc3 for single-precision > > That way the common case where no NaNs are produced we can avoid the libgcc > call and fall back to the > NaN handling stuff in libgcc if either components of the expansion are NaN. > > The implementation is done in expand_complex_multiplication in > tree-complex.c and the above expansion > will be done when optimising for -O1 and greater and when not optimising for > size. > At -O0 and -Os the single call to libgcc will be emitted. > > For the code: > __complex double > foo (__complex double a, __complex double b) > { > return a * b; > } > > We will now emit at -O2 for aarch64: > foo: > fmul d16, d1, d3 > fmul d6, d1, d2 > fnmsub d5, d0, d2, d16 > fmadd d4, d0, d3, d6 > fcmp d5, d4 > bvs .L8 > fmov d1, d4 > fmov d0, d5 > ret > .L8: > stp x29, x30, [sp, -16]! > mov x29, sp > bl __muldc3 > ldp x29, x30, [sp], 16 > ret > > Instead of just a branch to __muldc3. > > Bootstrapped and tested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, > x86_64-unknown-linux-gnu. > > Ok for trunk? (GCC 9)
+ /* If optimizing for size or not at all just do a libcall. */ + if (optimize == 0 || optimize_function_for_size_p (cfun)) + { + expand_complex_libcall (gsi, ar, ai, br, bi, MULT_EXPR); + return; + } use optimize_bb_for_size_p instead please (get BB from the mult stmt). /* Expand a complex multiplication or division to a libcall to the c99 + compliant routines. Unlike expand_complex_libcall create and insert + the call, assign it to an output variable and return that rather than + modifying existing statements in place. */ + +static tree +insert_complex_mult_libcall (gimple_stmt_iterator *gsi, tree type, tree ar, + tree ai, tree br, tree bi) +{ can you please try merging both functions instead? Also it shows a possible issue if with -fnon-call-exceptions the original multiplication has EH edges. I think you want to side-step that by doing the libcall-only way in that case as well (stmt_can_throw_internal). + tree isunordered_decl = builtin_decl_explicit (BUILT_IN_ISUNORDERED); + tree isunordered_res = create_tmp_var (integer_type_node); + gimple *tmpr_unord_check + = gimple_build_call (isunordered_decl, 2, tmpr, tmpi); + gimple_call_set_lhs (tmpr_unord_check, isunordered_res); + + gsi_insert_before (gsi, tmpr_unord_check, GSI_SAME_STMT); + gimple *check + = gimple_build_cond (NE_EXPR, isunordered_res, integer_zero_node, + NULL_TREE, NULL_TREE); why use BUILT_IN_ISUNORDERED but not a GIMPLE_COND with UNORDERED_EXPR? Note again that might trap/throw with -fsignalling-nans so better avoid this transform for flag_signalling_nans as well... + /* We have a conditional block with some assignments in cond_bb. + Wire up the PHIs to wrap up. */ + if (gimple_in_ssa_p (cfun)) + { we are always in SSA form(?) (probably tree-complex.c can use some TLC here). + /* If we are not worrying about NaNs expand to + (ar*br - ai*bi) + i(ar*bi + br*ai) directly. */ + expand_complex_multiplication_limited_range (gsi, inner_type, ar, ai, + br, bi, &rr, &ri); I think the function is badly worded - this isn't about limited ranges, no? Which also means that we can dispatch to this simple variant not only for flag_complex_method != 2 but for !HONOR_NANS && !HONOR_INFINITIES? Maybe that should be done as followup. Richard. > Thanks, > Kyrill > > 2018-04-30 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > PR tree-optimization/70291 > * tree-complex.c (insert_complex_mult_libcall): New function. > (expand_complex_multiplication_limited_range): Likewise. > (expand_complex_multiplication): Expand floating-point complex > multiplication using the above. > > 2018-04-30 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > PR tree-optimization/70291 > * gcc.dg/complex-6.c: New test. > * gcc.dg/complex-7.c: Likewise.