Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.

Carl Love via Gcc-patches Thu, 29 Jun 2023 14:36:21 -0700

Kewen:

On Wed, 2023-06-28 at 16:35 +0800, Kewen.Lin wrote:
> > Yea, I was going with a runnable test and didn't include the
> > instruction counts.  Added back in.  Rather then doing by processor
> > version (P8, P9, P10) I was able to do it by BE/LE.  The
> > instruction
> > counts were the same for LE accross processor versions but there
> > are a
> > few instruction counts that vary with BE and LE.
> 
> But the original test case only checks for cpu-types (processor
> version)
> but not for endianness, it means for the bif usages, there should not
> be
> different for endianness.  Why does this changes with your new test
> case?
> Could you have a further look and make it consistent with some
> adjustment
> if possible?  As we know, checking insn counts sometimes are fragile,
> so
> I think we should try our best to make it as robust as possible in
> the
> first place.
> 
> Besides, the original case also have some differences between p7/p8
> and
> p9.
>


There are differences on P8 LE versus BE.  I did a diff between the P8
and P9 tests:

 diff vsx-vector-6.p8.c vsx-vector-6.p9.c
3,4c3,4
< /* { dg-require-effective-target powerpc_p8vector_ok } */
< /* { dg-options "-O2 -mdejagnu-cpu=power8" } */
---
> /* { dg-require-effective-target powerpc_p9vector_ok } */
> /* { dg-options "-O2 -mdejagnu-cpu=power9" } */
12c12
< /* { dg-final { scan-assembler-times {\mvperm\M} 1 } } */
---
> /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */
23d22
< /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */
37c36
< /* { dg-final { scan-assembler-times {\mxvsubdp\M} 1 } } */
---
> /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */

So we can see the vperm, vpermr, xxpermr, xvmsubadp, xvmsubmdp,
xvsubdp, xvmsubadp, xvmsubmdp instruction count checks are different
between the two architectures.  I then wrote a script to compile the
CPU specific test on Power 8, Power 9 and Power 10 architectures and
then grep for the above list of instructions.  If I run the scrip on P8
BE  and LE I get


            Power 8 BE    Power 8 LE   Power 9 LE   Power 9 BE    Power 10 LE*
           (makalu-lp1)    (genoa)     (marlin)      (nilram)   (ltcd97-lp3)
instruction   count         count        count         count        count
vperm          1              1            0             0            0
vpermr         0              0            0             0            0
xxpermr        0              0            1             0            1
xvmsubadp      1              0            1             1            1
xvmsubmdp      0              1            0             0            0
xvsubdp        1              1            1             1            1


>From the diff we see 

  { dg-final {scan-assembler-times {\mxvmsub[am]dp\M} 1 } }

This test picks up the correct subtraction instruction for LE versus BE
so this "masks" the LE/BE difference.  I changed the check in vsx-
vector-6-func-3op.c to match.  This eliminates the LE and BE checks and
reduces the number of specific checks.

In vsx-vector-6-func-3op.c  The new test checks the counts for
xxpermdi, which the original test does not check.  The check for
xxpermdi are not needed.  They are not directly related to the builtin
tests.  I removed them.

Looking at the LE/BE checks in the other test file vsx-vector-6-func-
2op.c, instructions xvmaxsp, xvminsp and xvmaxdp were not checked in
the original test.  The functions where these instructions are used get
inlined.  On LE, the binary instructions show up in the inlined code as
well as what appears to be the binary for the original, non-inlined
function.  Best I can see, the binary for the original function is dead
code.  I don't see any calls to it.  Seems like it shouldn't be there
as it would make the binary smaller. On BE, I don't see the binary for
the original non-inlined function.  

I had played with putting -Wno-inline on the command line but that
didn't seem to make any difference.  However, you suggestion of
__attribute__ ((noipa)) does prevent the inlining and we don't get the
second copy of the instructions showing up. The inlining eliminated the
LE/BE differences for xvmaxsp, xvminsp and xvmaxdp.

The instruction count test for xxlor in vsx-vector-6-func-2lop.c
differs on LE and BE vsx-vector-6-func-2op.c.  I believe the
instruction is used with loads to reorder the data.  I don't see anyway
to get around the extra xxlor instructions and verify the vec_or
builtin test generates the instruction.  

I was able to eliminate all of the LE/BE qualifiers in the instruction
counts with the exception of xxlor.  By using the same checks that look
for multiple versions of xvmsumb*, as was done in the original test, we
can also eliminate LE/BE specific tests and account for different
instructions across CPU versions.  We could go back to checking for
specific instructions being generated on Power 8, Power 9, Power 10 if
you prefer not using checks that cover multiple flavors of a given
instruction across different CPU types.  

FYI, I eliminated the function call to do the various tests.  Instead,
I modified the macro to generate a function call to do the test and
check the results.

                               Carl

Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.

Reply via email to