Hi,
On 2021/6/9 07:25, Segher Boessenkool wrote:
On Mon, May 24, 2021 at 04:02:13AM -0500, Xionghu Luo wrote:
vmrghb only accepts permute index {0, 16, 1, 17, 2, 18, 3, 19, 4, 20,
5, 21, 6, 22, 7, 23} no matter for BE or LE in ISA, similarly for
vmrghlb.
(vmrglb)
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (
+ gen_altivec_vmrghb_direct (operands[0], operands[1],
operands[2]));
+ else
+ emit_insn (
+ gen_altivec_vmrglb_direct (operands[0], operands[2],
operands[1]));
Please don't indent like that, it doesn't match what we do elsewhere.
For better or for worse (for worse imo), we use deep hanging indents.
If you have to, you can do something like
rtx insn;
if (BYTES_BIG_ENDIAN)
insn = gen_altivec_vmrghb_direct (operands[0], operands[1],
operands[2]);
else
insn = gen_altivec_vmrglb_direct (operands[0], operands[2],
operands[1]);
emit_insn (insn);
(this is better even, in that it has only one emit_insn), or even
rtx (*fun) () = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
: gen_altivec_vmrglb_direct;
if (!BYTES_BIG_ENDIAN)
std::swap (operands[1], operands[2]);
emit_insn (fun (operands[0], operands[1], operands[2]));
Well, C++ does not allow that last example like that, sigh, so
rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ?
gen_altivec_vmrghb_direct
: gen_altivec_vmrglb_direct;
This is shorter than the other two options ;-)
Changed.
+(define_insn "altivec_vmrghb_direct"
[(set (match_operand:V16QI 0 "register_operand" "=v")
+ (vec_select:V16QI
This should be indented one space more.
"TARGET_ALTIVEC"
"@
- xxmrghw %x0,%x1,%x2
- vmrghw %0,%1,%2"
+ xxmrghw %x0,%x1,%x2
+ vmrghw %0,%1,%2"
The original indent was correct, please restore.
- emit_insn (gen_altivec_vmrghw_direct (operands[0], ve, vo));
+ emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve,
vo));
When you see a mode as part of a pattern name, chances are that it will
be a good candidate for using parameterized names with. (But don't do
that now, just keep it in mind as a nice cleanup to do).
OK.
@@ -23022,8 +23022,8 @@ altivec_expand_vec_perm_const (rtx target,
rtx op0, rtx op1,
: CODE_FOR_altivec_vmrglh_direct),
{ 0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7,
22, 23 } },
{ OPTION_MASK_ALTIVEC,
- (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct
- : CODE_FOR_altivec_vmrglw_direct),
+ (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
+ : CODE_FOR_altivec_vmrglw_direct_v4si),
The correct way is to align the ? and the : (or put everything on one
line of course, if that fits)
The parens around this are not needed btw, and are a distraction.
Changed.
--- a/gcc/testsuite/gcc.target/powerpc/builtins-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.c
@@ -317,10 +317,10 @@ int main ()
/* { dg-final { scan-assembler-times "vctuxs" 2 } } */
/* { dg-final { scan-assembler-times "vmrghb" 4 { target be } } } */
-/* { dg-final { scan-assembler-times "vmrghb" 5 { target le } } } */
+/* { dg-final { scan-assembler-times "vmrghb" 6 { target le } } } */
/* { dg-final { scan-assembler-times "vmrghh" 8 } } */
-/* { dg-final { scan-assembler-times "xxmrghw" 8 } } */
-/* { dg-final { scan-assembler-times "xxmrglw" 8 } } */
+/* { dg-final { scan-assembler-times "xxmrghw" 4 } } */
+/* { dg-final { scan-assembler-times "xxmrglw" 4 } } */
/* { dg-final { scan-assembler-times "vmrglh" 8 } } */
/* { dg-final { scan-assembler-times "xxlnor" 6 } } */
/* { dg-final { scan-assembler-times {\mvpkudus\M} 1 } } */
@@ -347,7 +347,7 @@ int main ()
/* { dg-final { scan-assembler-times "vspltb" 6 } } */
/* { dg-final { scan-assembler-times "vspltw" 0 } } */
/* { dg-final { scan-assembler-times "vmrgow" 8 } } */
-/* { dg-final { scan-assembler-times "vmrglb" 5 { target le } } } */
+/* { dg-final { scan-assembler-times "vmrglb" 4 { target le } } } */
/* { dg-final { scan-assembler-times "vmrglb" 6 { target be } } } */
/* { dg-final { scan-assembler-times "vmrgew" 8 } } */
/* { dg-final { scan-assembler-times "vsplth" 8 } } */
Are those changes correct? It looks like a vmrglb became a vmrghb, and
that 4 each of xxmrghw and xxmrglw disappeared? Both seem wrong?
This case is built with "-mdejagnu-cpu=power8 -O0 -mno-fold-gimple -dp"
and it also counted the generated instruction patterns.
1) "vsx_xxmrghw_v4si" is replaced by "altivec_vmrglw_direct_v4si/0",
so it decreases from 8 to 4. (Likewise for vsx_xxmrglw_v4si.)
li 9,48 # 1282 [c=4 l=4] *movdi_internal64/3
- lxvd2x 0,31,9 # 31 [c=8 l=4] *vsx_lxvd2x4_le_v4si
- xxpermdi 0,0,0,2 # 32 [c=4 l=4] xxswapd_v4si
- xxmrglw 0,0,12 # 33 [c=4 l=4] vsx_xxmrghw_v4si
+ lxvd2x 12,31,9 # 31 [c=8 l=4] *vsx_lxvd2x4_le_v4si
+ xxpermdi 12,12,12,2 # 32 [c=4 l=4] xxswapd_v4si
+ xxmrglw 0,12,0 # 33 [c=4 l=4] altivec_vmrglw_direct_v4si/0
xxpermdi 0,0,0,2 # 35 [c=4 l=4] xxswapd_v4sf
Note that v0 and v12 is swapped in lxvd2x, these new 3 instructions
produces same result than before.
2) "*altivec_vmrglb_internal" is replaced by "altivec_vmrghb_direct"
with this patch, then vmrglb count decreases from 5 to 4 and vmrghb
increases from 5 to 6. (BYTES_BIG_ENDIAN is checked early in RTL
generation instead of final to remove the UNSPECs for potential
optimization through backend.)
li 9,928 # 1424 [c=4 l=4] *movdi_internal64/3
lxvd2x 0,31,9 # 416 [c=8 l=4] *vsx_lxvd2x16_le_V16QI
- xxpermdi 33,0,0,2 # 417 [c=4 l=4] xxswapd_v16qi
+ xxpermdi 32,0,0,2 # 417 [c=4 l=4] xxswapd_v16qi
li 9,944 # 1425 [c=4 l=4] *movdi_internal64/3
lxvd2x 0,31,9 # 418 [c=8 l=4] *vsx_lxvd2x16_le_V16QI
- xxpermdi 32,0,0,2 # 419 [c=4 l=4] xxswapd_v16qi
- vmrghb 0,0,1 # 420 [c=4 l=4] *altivec_vmrglb_internal
+ xxpermdi 33,0,0,2 # 419 [c=4 l=4] xxswapd_v16qi
+ vmrghb 0,1,0 # 420 [c=4 l=4] altivec_vmrghb_direct
xxpermdi 0,32,32,2 # 421 [c=4 l=4] xxswapd_v16qi
Seems not necessary to also use \m and \M here to count only ASM here?
Update the patch as attached.