On Tue, Aug 11, 2020 at 08:01:50PM -0500, Segher Boessenkool wrote:
> Hi!
>
> On Tue, Aug 11, 2020 at 12:23:07PM -0400, Michael Meissner wrote:
> > + /* See if we can use the ISA 3.1 min/max/compare instructions for IEEE
> > + 128-bit floating point. At present, don't worry about doing
> > conditional
> > + moves with different types for the comparison and movement (unlike
> > SF/DF,
> > + where you can do a conditional test between double and use float as
> > the
> > + if/then parts. */
>
> Why is that? That makes the code quite different too (harder to
> review), but also, it could just use existing code more.
It is a combinatorial expansion problem.
In order to do:
double cmove (double x, double y, float a, float b)
{
return (a == b) ? x : y;
}
You need two pair of SFDF iterators. You use SFDF for the target, and SFDF2
for the comparison.
(define_insn_and_split "*mov<SFDF:mode><SFDF2:mode>cc_p9"
[(set (match_operand:SFDF 0 "vsx_register_operand"
"=&<SFDF:Fv>,<SFDF:Fv>")
(if_then_else:SFDF
(match_operator:CCFP 1 "fpmask_comparison_operator"
[(match_operand:SFDF2 2 "vsx_register_operand"
"<SFDF2:Fv>,<SFDF2:Fv>")
(match_operand:SFDF2 3 "vsx_register_operand"
"<SFDF2:Fv>,<SFDF2:Fv>")])
(match_operand:SFDF 4 "vsx_register_operand"
"<SFDF:Fv>,<SFDF:Fv>")
(match_operand:SFDF 5 "vsx_register_operand"
"<SFDF:Fv>,<SFDF:Fv>")))
(clobber (match_scratch:V2DI 6 "=0,&wa"))]
"TARGET_P9_MINMAX"
"#"
""
[(set (match_dup 6)
(if_then_else:V2DI (match_dup 1)
(match_dup 7)
(match_dup 8)))
(set (match_dup 0)
(if_then_else:SFDF (ne (match_dup 6)
(match_dup 8))
(match_dup 4)
(match_dup 5)))]
{
if (GET_CODE (operands[6]) == SCRATCH)
operands[6] = gen_reg_rtx (V2DImode);
operands[7] = CONSTM1_RTX (V2DImode);
operands[8] = CONST0_RTX (V2DImode);
}
[(set_attr "length" "8")
(set_attr "type" "vecperm")])
This means there are 4 versions of the pattern:
1: DF target, DF comparison
2: DF target, SF comparison
3: SF target, DF comparison
4: SF target, SF comparison
The dueling iterators was added in May 26th, 2016. I believe we had separate
insns for the 4 cases for fsel before I added the xscmpeqdp, etc. support.
To grow this for IEEE 128, you would need two iterators, each with 4 elements
(DF, SF, KF, and optionally TF). Thus you would need 16 patterns to represent
all of the patterns. I can do this, I just didn't think it was worth it.
In addition, it becomes more involved due to constraints. Currently for SF/DF
you essentially want to use any vector register at this point (when I added it
in 2016, there was still the support for limiting whether SF/DF could use the
Altivec registers). But for IEEE 128-bit fp types, you want "v" for the
registers used for comparison. You might want "v" for the contitional move
result, or you might want "wa".
I looked at combining the SF/DF and IEEE 128-bit cases, but it was becoming too
complex to describe these cases. It is doable, but it makes the diffs even
harder to read.
> > +;; IEEE 128-bit min/max
> > +(define_insn "s<minmax><mode>3"
> > + [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
> > + (fp_minmax:IEEE128
> > + (match_operand:IEEE128 1 "altivec_register_operand" "v")
> > + (match_operand:IEEE128 2 "altivec_register_operand" "v")))]
> > + "TARGET_FLOAT128_HW && TARGET_POWER10 && FLOAT128_IEEE_P (<MODE>mode)"
> > + "xs<minmax>cqp %0,%1,%2"
> > + [(set_attr "type" "fp")
> > + (set_attr "size" "128")])
>
> So why do we want the asymmetrical xsmincqp instead of xsminqp? This
> should be documented at the very least. (We also should have min/max
> that work correctly without -ffast-math, of course :-( ).
We don't have an xsminqp or xsmaxqp instruction in power10. We only have
xsmincqp and xsmaxcqp instructions.
> > +;; IEEE 128-bit conditional move. At present, don't worry about doing
> > +;; conditional moves with different types for the comparison and movement
> > +;; (unlike SF/DF, where you can do a conditional test between double and
> > use
> > +;; float as the if/then parts.
>
> (Unmatched brackets). So why is this? "At present" doesn't belong in
> the code btw, but in your patch description.
Typo.
> > +(define_insn_and_split "*mov<mode>cc_hardware"
>
> Please don't use meaningless names like this.
>
> > +(define_insn "*xxsel<mode>"
Yes, this was more due to the fact that I was cloning the SF/DF code, and that
doesn't have XXSEL support. I'll look at that.
> This already exists for other vector modes. Put it together with that
> please. Same for all other insn patterns.
>
> Ideally you can make those one pattern then, even.
>
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c
> > @@ -0,0 +1,70 @@
> > +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
> > +/* { dg-require-effective-target power10_ok } */
> > +/* { dg-options "-mdejagnu-cpu=power10 -O2 -ffast-math" } */
> > +/* { dg-final { scan-assembler-not "xscmpuqp" } } */
> > +/* { dg-final { scan-assembler "xscmpeqqp" } } */
> > +/* { dg-final { scan-assembler "xscmpgtqp" } } */
> > +/* { dg-final { scan-assembler "xscmpgeqp" } } */
> > +/* { dg-final { scan-assembler "xsmaxcqp" } } */
> > +/* { dg-final { scan-assembler "xsmincqp" } } */
> > +/* { dg-final { scan-assembler "xxsel" } } */
>
> Use \m \M please. Don't ask for powerpc* in gcc.target/powerpc/ (it is
> implied there). Why does it need lp64? There should be some test for
> __float128, instead.
>
> All in all, this is very hard to review :-(
--
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: [email protected], phone: +1 (978) 899-4797