Kewen: On Wed, 2023-06-28 at 16:35 +0800, Kewen.Lin wrote: > > Yea, I was going with a runnable test and didn't include the > > instruction counts. Added back in. Rather then doing by processor > > version (P8, P9, P10) I was able to do it by BE/LE. The > > instruction > > counts were the same for LE accross processor versions but there > > are a > > few instruction counts that vary with BE and LE. > > But the original test case only checks for cpu-types (processor > version) > but not for endianness, it means for the bif usages, there should not > be > different for endianness. Why does this changes with your new test > case? > Could you have a further look and make it consistent with some > adjustment > if possible? As we know, checking insn counts sometimes are fragile, > so > I think we should try our best to make it as robust as possible in > the > first place. > > Besides, the original case also have some differences between p7/p8 > and > p9. >
There are differences on P8 LE versus BE. I did a diff between the P8 and P9 tests: diff vsx-vector-6.p8.c vsx-vector-6.p9.c 3,4c3,4 < /* { dg-require-effective-target powerpc_p8vector_ok } */ < /* { dg-options "-O2 -mdejagnu-cpu=power8" } */ --- > /* { dg-require-effective-target powerpc_p9vector_ok } */ > /* { dg-options "-O2 -mdejagnu-cpu=power9" } */ 12c12 < /* { dg-final { scan-assembler-times {\mvperm\M} 1 } } */ --- > /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */ 23d22 < /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */ 37c36 < /* { dg-final { scan-assembler-times {\mxvsubdp\M} 1 } } */ --- > /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */ So we can see the vperm, vpermr, xxpermr, xvmsubadp, xvmsubmdp, xvsubdp, xvmsubadp, xvmsubmdp instruction count checks are different between the two architectures. I then wrote a script to compile the CPU specific test on Power 8, Power 9 and Power 10 architectures and then grep for the above list of instructions. If I run the scrip on P8 BE and LE I get Power 8 BE Power 8 LE Power 9 LE Power 9 BE Power 10 LE* (makalu-lp1) (genoa) (marlin) (nilram) (ltcd97-lp3) instruction count count count count count vperm 1 1 0 0 0 vpermr 0 0 0 0 0 xxpermr 0 0 1 0 1 xvmsubadp 1 0 1 1 1 xvmsubmdp 0 1 0 0 0 xvsubdp 1 1 1 1 1 >From the diff we see { dg-final {scan-assembler-times {\mxvmsub[am]dp\M} 1 } } This test picks up the correct subtraction instruction for LE versus BE so this "masks" the LE/BE difference. I changed the check in vsx- vector-6-func-3op.c to match. This eliminates the LE and BE checks and reduces the number of specific checks. In vsx-vector-6-func-3op.c The new test checks the counts for xxpermdi, which the original test does not check. The check for xxpermdi are not needed. They are not directly related to the builtin tests. I removed them. Looking at the LE/BE checks in the other test file vsx-vector-6-func- 2op.c, instructions xvmaxsp, xvminsp and xvmaxdp were not checked in the original test. The functions where these instructions are used get inlined. On LE, the binary instructions show up in the inlined code as well as what appears to be the binary for the original, non-inlined function. Best I can see, the binary for the original function is dead code. I don't see any calls to it. Seems like it shouldn't be there as it would make the binary smaller. On BE, I don't see the binary for the original non-inlined function. I had played with putting -Wno-inline on the command line but that didn't seem to make any difference. However, you suggestion of __attribute__ ((noipa)) does prevent the inlining and we don't get the second copy of the instructions showing up. The inlining eliminated the LE/BE differences for xvmaxsp, xvminsp and xvmaxdp. The instruction count test for xxlor in vsx-vector-6-func-2lop.c differs on LE and BE vsx-vector-6-func-2op.c. I believe the instruction is used with loads to reorder the data. I don't see anyway to get around the extra xxlor instructions and verify the vec_or builtin test generates the instruction. I was able to eliminate all of the LE/BE qualifiers in the instruction counts with the exception of xxlor. By using the same checks that look for multiple versions of xvmsumb*, as was done in the original test, we can also eliminate LE/BE specific tests and account for different instructions across CPU versions. We could go back to checking for specific instructions being generated on Power 8, Power 9, Power 10 if you prefer not using checks that cover multiple flavors of a given instruction across different CPU types. FYI, I eliminated the function call to do the various tests. Instead, I modified the macro to generate a function call to do the test and check the results. Carl