On Tue, Aug 4, 2020 at 2:18 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > This middle-end patch teaches fold/match to recognize the idiom for > a highpart multiplication and represent it internally as a > MULT_HIGHPART_EXPR tree code. At RTL expansion time, the compiler > will trying using an appropriate instruction (sequence) provided > by the backend, but if that fails, this patch now provides a fallback > by synthesizing a suitable sequence using either a widening multiply > or a multiplication in a wider mode [matching the original tree]. > > The benefit of this internal canonicalization is that it allows GCC > to generate muldi3_highpart instructions even on targets that require > a libcall to perform TImode multiplications. Currently the RTL > optimizers can recognize highpart multiplications in combine, but > this matching fails when the multiplication requires a libcall. > Rather than attempt to do something via REG_EQUAL_NOTEs, a clever > solution is to make more use of the MULT_HIGHPART_EXPR tree code > in the tree optimizers. > > This patch has been tested on x86_64-pc-linux-gnu with a "make > bootstrap" and "make -k check", and on nvptx-none with a "make" > and "make -k check", both with no few failures. There's an > additional target-specific test in the nvptx patch to support > "mul.hi.s64" and "mul.hi.u64" that I'm just about to post, but > this code is already well exercised during bootstrap by libgcc. > > Ok for mainline?
So currently MULT_HIGHPART_EXPR is one of the operations that only appear when there's suitable target support (it's solely used by vectorization it seems). Is there any benefit of matching the pattern when the target doesn't support it? That is, how about guarding it with can_mult_highpart_p ()? Also note the usual issue of early introduction of such MULT_HIGHPART_EXPR which isn't widely supported in optimization passes - it's one of the things I'd rather do as part of instruction selection before RTL expansion (where it could also directly synthesize a direct internal fn for the respective optab). So do you see any advantage synthesizing those a) early, b) for targets that do not natively support it? We now have the ISEL pass and if you write the pattern as (match (highpart_multiply @0 @1) (convert (rshift (mult:s ( (convert@3 @0) (convert @1)) (INTEGER_CST@2))) (if (INTEGRAL_TYPE_P (type) && INTEGRAL_TYPE_P (TREE_TYPE (@3)) && types_match (type, TREE_TYPE (@0)) && types_match (type, TREE_TYPE (@1)) && (TYPE_PRECISION (TREE_TYPE (@3)) >= 2 * TYPE_PRECISION (type)) && tree_fits_uhwi_p (@2) && tree_to_uhwi (@2) == TYPE_PRECISION (type) && TYPE_SIGN (TREE_TYPE (@3)) == TYPE_SIGN (type)))) you can match that in C++ code doing extern bool gimple_highpart_multiply (tree, tree *, tree (*)(tree)); tree res_ops[2]; if (gimple_highpart_multiply (gimple_assign_lhs (stmt), &res_ops, NULL) { res_ops[0] is now @0 and res_ops[1] is @1 and you know 'stmt' computes a highpart multiply } Thanks, Richard. > > 2020-08-04 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > * match.pd (((wide)x * (wide)y)>>C -> mult_highpart): New > simplification/canonicalization to recognize MULT_HIGHPART_EXPR. > * optabs.c (expand_mult_highpart_1): New function to expand > MULT_HIGHPART_EXPR as a widening or a wide multiplication > followed by a right shift (or a gen_highpart subreg). > (expand_mult_highpart): Call the above function if the target > doesn't provide a suitable optab. > > gcc/testsuite/ChangeLog > * gcc.dg/fold-mult-highpart-1.c: New test. > > > Thanks in advance, > Roger > -- > Roger Sayle > NextMove Software > Cambridge, UK >