https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115427

Kewen Lin <linkw at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |linkw at gcc dot gnu.org
           Keywords|                            |internal-improvement
                 CC|                            |bergner at gcc dot gnu.org,
                   |                            |guihaoc at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org,
                   |                            |rsandifo at gcc dot gnu.org,
                   |                            |segher at gcc dot gnu.org

--- Comment #1 from Kewen Lin <linkw at gcc dot gnu.org> ---
Now we have expand_builtin_interclass_mathfn to expand these functions if they
don't have optab defined, it seems fine to generate equivalent RTL as
fold_builtin_interclass_mathfn there. However, by considering the
maintainability, IMHO it's better to reuse the tree exp in
fold_builtin_interclass_mathfn, then we only have one place for such folding.
It would be like something:

@@ -2534,6 +2536,20 @@ expand_builtin_interclass_mathfn (tree exp, rtx target)
           && maybe_emit_unop_insn (icode, ops[0].value, op0, UNKNOWN))
         return ops[0].value;

+      location_t loc = EXPR_LOCATION (exp);
+      tree fold_res
+        = fold_builtin_interclass_mathfn (loc, fndecl, orig_arg, false);
+
+      if (fold_res)
+        {
+          op0 = expand_expr (fold_res, NULL_RTX, VOIDmode, EXPAND_NORMAL);
+          tree rtype = TREE_TYPE (TREE_TYPE (fndecl));
+          machine_mode rmode = TYPE_MODE (rtype);
+          if (rmode != GET_MODE (op0))
+            op0 = convert_to_mode (rmode, op0, 0);
+          return op0;
+        }
+
       delete_insns_since (last);
       CALL_EXPR_ARG (exp, 0) = orig_arg;

But unfortunately since fold_builtin_interclass_mathfn is for both front-end
and middle-end, it would have some tree code like TRUTH_NOT_EXPR, which isn't
supported in expand_expr. To make it work, we can change TRUTH_NOT_EXPR with
BIT_NOT_EXPR (like in fold_builtin_unordered_cmp), but there are some other
codes like TRUTH_ANDIF_EXPR, TRUTH_ORIF_EXPR (for ibmlongdouble) which can't be
replaced with BIT_AND_EXPR and BIT_OR_EXPR by considering the short-circuit, so
I tried to use COND_EXPR for them instead, but by testing a case with ibmlong
double, there are still some gaps from the original folding code.

I also tried a hackish way that is to force tree exp to gimple stmts and try to
expand these stmts one by one, but it adds more ssa than before and ICE on ssa
to rtx things, not sure if it's a considerable direction to dig into.

I'm looking for suggestions here, is there some existing practice to follow?
which is preferred that expanding from folded tree exp or generating equivalent
rtx directly.  If for the former one, allowing some difference from the
original folding (FAIL can be rare), or experimenting some other ways.

Reply via email to