On Mon, May 31, 2021 at 06:32:04AM -0700, H.J. Lu wrote:
On Mon, May 31, 2021 at 6:26 AM Richard Biener
<richard.guent...@gmail.com> wrote:
On Mon, May 31, 2021 at 3:12 PM H.J. Lu <hjl.to...@gmail.com> wrote:
On Mon, May 31, 2021 at 5:46 AM Richard Biener
<richard.guent...@gmail.com> wrote:
On Mon, May 31, 2021 at 2:09 PM H.J. Lu <hjl.to...@gmail.com> wrote:
On Wed, May 26, 2021 at 10:28:16AM +0200, Richard Biener wrote:
-- Target Hook: rtx TARGET_GEN_MEMSET_VALUE (rtx DATA, scalar_int_mode
MODE)
This function returns the RTL of a register containing
'GET_MODE_SIZE (MODE)' consecutive copies of the unsigned char
value given in the RTL register DATA. For example, if MODE is 4
bytes wide, return the RTL for 0x01010101*DATA.
For this one I wonder if it should be an optab instead. Couldn't you
use the existing vec_duplicate for this by using (paradoxical) subregs
like (subreg:TI (vec_duplicate:VnQI (subreg:VnQI (reg:QI ...)))?
I tried. It doesn't even work on x86. See:
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570661.html
Not sure what I should read from there...
There are special cases to subreg HI, SI and DI modes of TI mode in
ix86_gen_memset_value_from_prev. simplify_gen_subreg doesn't
work here. Each backend may need its own special handling.
OK, I guess I'm not (RTL) qualified enough to further review these parts,
sorry. Since we're doing code generation the canonical way to communicate
with backends should be optabs, not some set of disconnected target hooks.
But as said, I probably don't know enough of RTL to see why it's the only way.
Richard.
Here is the patch to add optabs instead. Does it look OK?
Thanks.
H.J.
---
Add 2 optabs:
1. integer_extract: Extract lower bit value from the integer value in
TImode, OImode or XImode.
That sounds very specific, esp. the restriction to {TI,OI,XI}mode.
It also sounds like it matches (subreg:{TI,OI,XI} (...) 0). There are
existing target hooks verifying subreg validity - why's that not a good
fit here? ISTR you say gen_lowpart () doesn't work (or was it
simplify_gen_subreg?), why's that so?
{TI,OI,XI}mode are storage only integer types. subreg doesn't work
well on them. I got
[hjl@gnu-cfl-2 pieces]$ cat s2.i
extern void *ops;
void
foo (int c)
{
__builtin_memset (ops, c, 34);
}
[hjl@gnu-cfl-2 pieces]$ make s2.s
/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/
-O2 -march=haswell -S s2.i
during RTL pass: reload
s2.i: In function ‘foo’:
s2.i:7:1: internal compiler error: maximum number of generated reload
insns per insn achieved (90)
7 | }
| ^
0x1050734 lra_constraints(bool)
/export/gnu/import/git/gitlab/x86-gcc/gcc/lra-constraints.c:5091
0x1039536 lra(_IO_FILE*)
/export/gnu/import/git/gitlab/x86-gcc/gcc/lra.c:2336
0xfe1140 do_reload
/export/gnu/import/git/gitlab/x86-gcc/gcc/ira.c:5822
0xfe162e execute
/export/gnu/import/git/gitlab/x86-gcc/gcc/ira.c:6008
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
make: *** [Makefile:32: s2.s] Error 1
[hjl@gnu-cfl-2 pieces]$
due to
(insn 12 11 0 (set (mem:HI (plus:DI (reg/f:DI 84)
(const_int 32 [0x20])) [0 MEM <char[1:34]> [(void
*)ops.0_1]+32 S2 A8])
(subreg:HI (reg:OI 51 xmm15) 0)) "s2.i":6:3 -1
(nil))
The new optab gives us
(insn 12 11 13 2 (set (reg:TI 88)
(reg:TI 51 xmm15)) "s2.i":6:3 -1
(nil))
(insn 13 12 14 2 (set (reg:SI 89)
(subreg:SI (reg:TI 88) 0)) "s2.i":6:3 -1
(nil))
(insn 14 13 15 2 (set (reg:HI 87)
(subreg:HI (reg:SI 89) 0)) "s2.i":6:3 -1
(nil))
that looks odd to me - what's the final result after LRA? I think
I got:
vmovd %edi, %xmm15
movq ops(%rip), %rdx
vpbroadcastb %xmm15, %ymm15
vmovq %xmm15, %rax <<<< move to GPR
vmovdqu %ymm15, (%rdx)
movw %ax, 32(%rdx) <<<< subreg of GPR
vzeroupper
ret
we should see to make lowpart_subreg work on {XI,OI,TI}mode.
Only two steps should be necessary at most:
xmm -> gpr, grp -> subreg, or gpr -> subreg. So the expander
code in memset should try to generate the subreg directly
subreg didn't fail on x86 when I tried.
and if that fails, try a word_mode subreg followed by the subreg.
I will try word_mode subreg.
Here is the v2 patch to use word_mode subreg. For
---
extern void *ops;
void
foo (int c)
{
__builtin_memset (ops, 4, 32);
}
---
without vec_const_duplicate, I got
movl $4, %eax
movq ops(%rip), %rdx
movd %eax, %xmm0
punpcklbw %xmm0, %xmm0
punpcklwd %xmm0, %xmm0
pshufd $0, %xmm0, %xmm0
movups %xmm0, (%rdx)
movups %xmm0, 16(%rdx)
ret
With vec_const_duplicate, I got
movq ops(%rip), %rax
movdqa .LC0(%rip), %xmm0
movups %xmm0, (%rax)
movups %xmm0, 16(%rax)
ret
If vec_duplicate is allowed to fail, I don't need vec_const_duplicate.