Add some missing APX NF and NDD support for imul and mul.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
* config/i386/i386.md (*imulhizu): Added APX
NF support.
(*imulhizu): New define_insn.
(*mulsi3_1_zext): Ditto.
Also add some comment for list cpuid are not supported in 32 bit.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.
gcc/ChangeLog:
* config/i386/i386-options.cc (ix86_option_override_internal):
Remove compiler report error for -mapxf or -muintr with
On Thu, Jul 18, 2024, 10:00 AM kong lingling
mailto:lingling.ko...@gmail.com>> wrote:
Also add some comment for list cpuid are not supported in 32 bit.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.
gcc/ChangeLog:
* config/i386/i386-opti
I adjusted my patch based on the comments by H.J.
And I will add the testcase like gcc.target/i386/pr101395-1.c when the march
for APX is determined.
Ok for trunk?
Thanks,
Lingling
gcc/ChangeLog:
PR target/115978
* config/i386/driver-i386.cc (host_detect_local_cpu): Enable
From: Hongyu Wang
APX NF(no flags) feature implements suppresses the update of status flags for
arithmetic operations.
For NF add, it is not clear whether NF add can be faster than lea. If so, the
pattern needs to be adjusted to prefer LEA generation.
gcc/ChangeLog:
* config/i386/i38
gcc/ChangeLog:
* config/i386/i386.md (*sub_1_nf): New define_insn.
(*anddi_1_nf): Ditto.
(*and_1_nf): Ditto.
(*qi_1_nf): Ditto.
(*_1_nf): Ditto.
(*neg_1_nf): Ditto.
* config/i386/sse.md : New define_split.
gcc/testsuite/ChangeLog:
*
gcc/ChangeLog:
* config/i386/i386.md (*ashl3_1_nf): New.
(*ashlhi3_1_nf): Ditto.
(*ashlqi3_1_nf): Ditto.
* config/i386/sse.md: New define_split.
---
gcc/config/i386/i386.md | 175
gcc/config/i386/sse.md | 13 +++
2 files c
gcc/ChangeLog:
* config/i386/i386.md (*ashr3_1_nf): New.
(*lshr3_1_nf): Ditto.
(*lshrqi3_1_nf): Ditto.
(*lshrhi3_1_nf): Ditto.
---
gcc/config/i386/i386.md | 85 +
1 file changed, 85 insertions(+)
diff --git a/gcc/config/i386
gcc/ChangeLog:
* config/i386/i386.md (ashr3_cvt_nf): New define_insn.
(*3_1_nf): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-nf.c: Add NF test for rotate insns.
---
gcc/config/i386/i386.md| 80 ++
gcc/testsuite/gcc.target
gcc/ChangeLog:
* config/i386/i386.md (*mul3_1_nf): New define_insn.
(*mulqi3_1_nf): Ditto.
(*divmod4_noext_nf): Ditto.
(divmodhiqi3_nf): Ditto.
---
gcc/config/i386/i386.md | 86 +
1 file changed, 86 insertions(+)
diff --git
gcc/ChangeLog:
* config/i386/i386.md (x86_64_shld_nf): New define_insn.
(x86_64_shld_ndd_nf): Ditto.
(x86_64_shld_1_nf): Ditto.
(x86_64_shld_ndd_1_nf): Ditto.
(*x86_64_shld_shrd_1_nozext_nf): Ditto.
(x86_shld_nf): Ditto.
(x86_shld_ndd_nf): Di
gcc/ChangeLog:
* config/i386/i386.md (clz2_lzcnt_nf): New define_insn.
(*clz2_lzcnt_falsedep_nf): Ditto.
(__nf): Ditto.
(*__falsedep_nf): Ditto.
(_hi_nf): Ditto.
(popcount2_nf): Ditto.
(*popcount2_falsedep_nf): Ditto.
(popcounthi2_nf)
> -Original Message-
> From: Uros Bizjak
> Sent: Wednesday, May 15, 2024 4:15 PM
> To: Kong, Lingling
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; Wang,
> Hongyu
> Subject: Re: [PATCH 1/8] [APX NF]: Support APX NF add
>
> On Wed, May 15, 2024 at 9:43
> I wonder if we can use "define_subst" to conditionally add flags clobber
> for !TARGET_APX_NF targets. Even the example for "Define Subst" uses the insn
> w/ and w/o the clobber, so I think it is worth considering this approach.
>
> Uros.
Good Suggestion, I defined new subst for no flags, and B
gcc/ChangeLog:
* config/i386/i386.md (nf_and_applied): New subst_attr.
(nf_x64_and_applied): Ditto.
(*sub_1_nf): New define_insn.
(*anddi_1_nf): Ditto.
(*and_1_nf): Ditto.
(*qi_1_nf): Ditto.
(*
gcc/ChangeLog:
* config/i386/i386.md (*ashl3_1_nf): New.
(*ashlhi3_1_nf): Ditto.
(*ashlqi3_1_nf): Ditto.
* config/i386/sse.md: New define_split.
---
gcc/config/i386/i386.md | 80 +++--
gcc/config/i386/sse.md | 13 +++
2 file
gcc/ChangeLog:
* config/i386/i386.md (*ashr3_1_nf): New.
(*lshr3_1_nf): Ditto.
(*lshrqi3_1_nf): Ditto.
(*lshrhi3_1_nf): Ditto.
---
gcc/config/i386/i386.md | 82 +++--
1 file changed, 46 insertions(+), 36 deletions(-)
diff --git
gcc/ChangeLog:
* config/i386/i386.md (ashr3_cvt_nf): New define_insn.
(*3_1_nf): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-nf.c: Add NF test for rotate insns.
---
gcc/config/i386/i386.md| 53 --
gcc/testsuite/gcc.target
gcc/ChangeLog:
* config/i386/i386.md (x86_64_shld_nf): New define_insn.
(x86_64_shld_ndd_nf): Ditto.
(x86_64_shld_1_nf): Ditto.
(x86_64_shld_ndd_1_nf): Ditto.
(*x86_64_shld_shrd_1_nozext_nf): Ditto.
(x86_shld_nf): Ditto.
(x86_shld_ndd_nf): Di
gcc/ChangeLog:
* config/i386/i386.md (*mul3_1_nf): New define_insn.
(*mulqi3_1_nf): Ditto.
(*divmod4_noext_nf): Ditto.
(divmodhiqi3_nf): Ditto.
---
gcc/config/i386/i386.md | 47 ++---
1 file changed, 30 insertions(+), 17 deletion
gcc/ChangeLog:
* config/i386/i386.md (clz2_lzcnt_nf): New define_insn.
(*clz2_lzcnt_falsedep_nf): Ditto.
(__nf): Ditto.
(*__falsedep_nf): Ditto.
(_hi_nf): Ditto.
(popcount2_nf): Ditto.
(*popcount2_falsedep_nf): Ditto.
(popcounthi2_nf)
Cc Uros.
From: Kong, Lingling
Sent: Wednesday, May 22, 2024 4:35 PM
To: gcc-patches@gcc.gnu.org
Cc: Liu, Hongtao ; Kong, Lingling
Subject: [PATCH v2 2/8] [APX NF] Support APX NF for {sub/and/or/xor/neg}
gcc/ChangeLog:
* config/i386/i386.md (nf_and_applied): New subst_attr
Hi, compared with v2, these patches restored the original lea patten position
and addressed hongtao's comment.
APX NF(no flags) feature implements suppresses the update of status flags
for arithmetic operations.
For NF add, it is not clear whether nf add can be faster than lea. If so,
the patte
gcc/ChangeLog:
* config/i386/i386.md (*ashl3_1_nf): New.
(*ashlhi3_1_nf): Ditto.
(*ashlqi3_1_nf): Ditto.
* config/i386/sse.md: New define_split.
---
gcc/config/i386/i386.md | 96 ++---
gcc/config/i386/sse.md | 13 ++
2 files
gcc/ChangeLog:
* config/i386/i386.md (nf_nonf_attr): New subst_attr.
(nf_nonf_x64_attr): Ditto.
(*sub_1_nf): New define_insn.
(*anddi_1_nf): Ditto.
(*and_1_nf): Ditto.
(*qi_1_nf): Ditto.
(*_1_nf): Ditto.
(*neg_1_nf): Ditto.
*
gcc/ChangeLog:
* config/i386/i386.md (ashr3_cvt_nf): New define_insn.
(*3_1_nf): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-nf.c: Add NF test for rotate insns.
---
gcc/config/i386/i386.md| 59 +-
gcc/testsuite/gcc.target
gcc/ChangeLog:
* config/i386/i386.md (clz2_lzcnt_nf): New define_insn.
(*clz2_lzcnt_falsedep_nf): Ditto.
(__nf): Ditto.
(*__falsedep_nf): Ditto.
(_hi_nf): Ditto.
(popcount2_nf): Ditto.
(*popcount2_falsedep_nf): Ditto.
(popcounthi2_nf)
gcc/ChangeLog:
* config/i386/i386.md (*ashr3_1_nf): New.
(*lshr3_1_nf): Ditto.
(*lshrqi3_1_nf): Ditto.
(*lshrhi3_1_nf): Ditto.
---
gcc/config/i386/i386.md | 82 +++--
1 file changed, 46 insertions(+), 36 deletions(-)
diff --git
gcc/ChangeLog:
* config/i386/i386.md (*mul3_1_nf): New define_insn.
(*mulqi3_1_nf): Ditto.
(*divmod4_noext_nf): Ditto.
(divmodhiqi3_nf): Ditto.
---
gcc/config/i386/i386.md | 47 ++---
1 file changed, 30 insertions(+), 17 deletion
gcc/ChangeLog:
* config/i386/i386.md (x86_64_shld_nf): New define_insn.
(x86_64_shld_ndd_nf): Ditto.
(x86_64_shld_1_nf): Ditto.
(x86_64_shld_ndd_1_nf): Ditto.
(*x86_64_shld_shrd_1_nozext_nf): Ditto.
(x86_shld_nf): Ditto.
(x86_shld_ndd_nf): Di
Hi,
Gently ping.
Thanks,
Lingling
From: Kong, Lingling
Sent: Tuesday, June 25, 2024 2:46 PM
To: gcc-patches@gcc.gnu.org
Cc: Alexander Monakov ; Uros Bizjak ;
lingling.ko...@gmail.com; Hongtao Liu ; Jeff Law
; Richard Biener
Subject: RE: [PATCH v2 1/2] [APX CFCMOV] Support APX CFCMOV in
For APX instruction with an NDD, the destination GPR will get the
instruction’s result in bits [OSIZE-1:0] and, if OSIZE < 64b, have its
upper bits [63:OSIZE] zeroed. Now supporting other NDD instructions.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
gcc/ChangeLog:
PR target/113729
* config/i386/i386.md (*subqi_1_zext): New
define_insn.
(*subhi_1_zext): Ditto.
(*addqi3_carry_zext): Ditto.
(*addhi3_carry_zext): Ditto.
(*addqi3_carry_
gcc/ChangeLog:
PR target/113729
* config/i386/i386.md (*andqi_1_zext):
New define_insn.
(*andhi_1_zext): Ditto.
(*qi_1_zext): Ditto.
(*hi_1_zext): Ditto.
(*negqi_1_zext): Ditto.
gcc/ChangeLog:
PR target/113729
* config/i386/i386.md (*ashlqi3_1_zext):
New define_insn.
(*ashlhi3_1_zext): Ditto.
(*qi3_1_zext): Ditto.
(*hi3_1_zext): Ditto.
(*qi3_1_zext): Ditto.
Hi,
Gently ping.
Thanks,
Lingling
From: kong lingling
Sent: Monday, August 12, 2024 3:10 PM
To: gcc-patches@gcc.gnu.org
Cc: H. J. Lu ; Kong, Lingling ;
Liu, Hongtao
Subject: [PATCH 1/4] i386: Optimization for APX NDD is always zero-uppered for
ADD
For APX instruction with an NDD, the
Although these vex insn have evex counterpart, but when it uses the displayed
vex prefix should not support APX EGPR.
Like TARGET_AVXVNNI, TARGET_IFMA and TARGET_AVXNECONVERT.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
* config/i386/sse.md (vp
-Original Message-
From: Kong, Lingling
Sent: Wednesday, August 14, 2024 4:20 PM
To: Kong, Lingling
Subject: [PATCH v2] i386: Fix some vex insns that prohibit egpr
Although these vex insn have evex counterpart, but when it uses the displayed
vex prefix should not support APX EGPR
Enable ZU for IMUL (opcodes 0x69 and 0x6B) and SETcc.
gcc/ChangeLog:
* config/i386/i386-opts.h (enum apx_features):Add apx_zu.
* config/i386/i386.h (TARGET_APX_ZU): Define.
* config/i386/i386.md (*imulhizu): New define_insn.
(*setcc__zu): Ditto.
* config/i3
From: konglin1
gcc/ChangeLog:
* doc/tm.texi: Regenerated.
* doc/tm.texi.in: Add TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP
* target.def (bool,): New hook.
* targhooks.cc (default_have_conditional_move_mem_notrap): New
function to hook TARGET_HAVE_CONDITIONAL_
From: konglin1 mailto:lingling.k...@intel.com>>
APX CFCMOV feature implements conditionally faulting which means that all
memory faults are suppressed when the condition code evaluates to false and
load or store a memory operand. Now we could load or store a memory operand
may trap or fault f
APX CFCMOV[1] feature implements conditionally faulting which means that all
memory faults are suppressed
when the condition code evaluates to false and load or store a memory operand.
Now we could load or store a
memory operand may trap or fault for conditional move.
In middle-end, now we don't
From: konglin1
APX CFCMOV feature implements conditionally faulting which means that all
memory faults are suppressed when the condition code evaluates to false and
load or store a memory operand. Now we could load or store a memory operand
may trap or fault for conditional move.
In middle-end,
From: Lingling Kong
After added target HOOK TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP,
we could support a conditional move that load or store mem may trap
or fault in if convert pass.
Conditional move suppress fault for conditional mem store would not
move any arithmetic calculations. For conditio
From: Lingling Kong
Handle target hook TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP and support
CFCMOV in backend.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_can_cfcmov_p): New function that
test if the cfcmov can be generated.
(ix86_expand_int_movcc): Expand to cfcmo
Fix test for APX ZU. Add attribute for no-inline and target APX, and
target-support check.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Committed as an obvious patch.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-zu-1.c: Add attribute for noinline,
APX CFCMOV feature implements conditionally faulting which means
that all memory faults are suppressed when the condition code
evaluates to false and load or store a memory operand. Now we
could load or store a memory operand may trap or fault for
conditional move.
In middle-end, now we don'
deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c
create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c
--
> -Original Message-
> From: Hongtao Liu
> Sent: Monday, June 17, 2024 11:05 AM
> To: Jeff Law
> Cc: Alexander Monakov ; Kong, L
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_can_cfcmov_p): New function that
test if the cfcmov can be generated.
(ix86_expand_int_movcc): Expand to cfcmov pattern if ix86_can_cfcmov_p
return ture.
* config/i386/i386-opts.h (enum apx_features): Add apx
Hi,
Gently ping for this.
This version has removed the target hook and added a new optab for cfcmov.
Thanks,
Lingling
From: Kong, Lingling
Sent: Tuesday, June 18, 2024 3:41 PM
To: gcc-patches@gcc.gnu.org
Cc: Alexander Monakov ; Uros Bizjak ;
lingling.ko...@gmail.com; Hongtao Liu ; Jeff Law
> -Original Message-
> From: Haochen Jiang
> Sent: Monday, July 22, 2024 2:41 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] i386: Change prefetchi output template
>
> Hi all,
>
> For prefetchi instructions, RIP-relative address is explicitl
Tested spec2017 performance in Sierra Forest, Icelake, CascadeLake, at least
there is no obvious regression.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
OK for trunk?
gcc/ChangeLog:
* config/i386/x86-tune-costs.h (struct processor_costs):
Adjust rtx_cost of imulq
*add_4 and *adddi_4 are for shorter opcode from cmp to inc/dec or add
$128.
But NDD code is longer than the cmp code, so there is no need to support NDD.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
PR target/113744
* con
The je constraint should be used for APX NDD ADD with register source
operand. The jM is for APX NDD patterns with immediate operand.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
* config/i386/i386.md (nf_mem_constraint): Fixed the constraint
> -Original Message-
> From: Liu, Hongtao
> Sent: Thursday, August 1, 2024 9:35 AM
> To: Kong, Lingling ; gcc-patches@gcc.gnu.org
> Cc: Wang, Hongyu
> Subject: RE: [PATCH] i386: Fix memory constraint for APX NF
>
>
>
> > -Original Message-
&
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
* config/i386/constraints.md: Fixed the comment/naming
for je/jM/jO.
* config/i386/predicates.md (apx_ndd_memory_operand):
Renamed and fixed the comment.
(apx_evex_memory
Hi,
(if_then_else:SI (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
(reg/v:SI 101 [ e ])
(reg:SI 102))
The cost is 8 for the rtx, the cost for
(eq (reg:CCZ 17 flags) (const_int 0 [0])) is 4, but this is just an operator do
not need to compute it's cost in cmov.
Bootstrapped and regtest
Hi,
Originally eliminate_regs_in_insn will transform
(parallel [
(set (reg:QI 130)
(plus:QI (subreg:QI (reg:DI 19 frame) 0)
(const_int 96)))
(clobber (reg:CC 17 flag))]) {*addqi_1}
to
(set (reg:QI 130)
(subreg:QI (reg:DI 19 frame) 0)) {*movqi_internal}
when verify_changes.
But
Hi,
This version has added a new optab named 'cfmovcc'. The new optab is used
in the middle end to expand to cfcmov. And simplified my patch by trying to
generate the conditional faulting movcc in noce_try_cmove_arith function.
All the changes passed bootstrap & regtest x86-64-pc-linux-gnu.
We al
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_can_cfcmov_p): New func
that test if the cfcmov can be generated.
(ix86_expand_int_cfmovcc): Expand to cfcmov pattern.
* config/i386/i386-opts.h (enum apx_features): New.
*
> -Original Message-
> From: Richard Sandiford
> Sent: Friday, September 6, 2024 5:19 PM
> To: Kong, Lingling
> Cc: gcc-patches@gcc.gnu.org; Jeff Law ; Richard Biener
> ; Uros Bizjak ; Hongtao Liu
> ; Jakub Jelinek
> Subject: Re: [PATCH v3 1/2] [APX CFCMO
Hi,
Fixed scalar VCOMSBF16 misused in AVX10.2.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m64}.
Ok for trunk?
gcc/ChangeLog:
* config/i386/sse.md (avx10_2_comsbf16_v8bf): Fixed scalar
operands.
---
gcc/config/i386/sse.md | 8 ++--
1 file changed, 6 insertions(+), 2
> > "Kong, Lingling" writes:
> > > Hi,
> > >
> > > This version has added a new optab named 'cfmovcc'. The new optab is
> > > used in the middle end to expand to cfcmov. And simplified my patch
> > > by trying to generate th
Hi,
After APX NF, CCMP and NF features supported, the comment for APX option also
need update.
Ok for trunk?
gcc/ChangeLog:
* config/i386/i386.opt: Update the features included in apxf.
---
gcc/config/i386/i386.opt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/g
Hi,
Many thanks to Richard for the suggestion that conditional load is like a
scalar instance of maskload_optab . So this version has use maskload and
maskstore optab to expand and generate cfcmov in ifcvt pass.
All the changes passed bootstrap & regtest x86-64-pc-linux-gnu.
We also tested spec
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_int_cfmovcc): Expand
to cfcmov pattern.
* config/i386/i386-opts.h (enum apx_features): New.
* config/i386/i386-protos.h (ix86_expand_int_cfmovcc): Define.
* config/i386/i386.cc (ix86_rtx_costs): Add U
Hi,
LGTM.
Now Hongyu and Hongtao are working on APX.
Thanks,
Lingling
> -Original Message-
> From: Gregory Kanter
> Sent: Saturday, November 23, 2024 8:16 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Kong, Lingling ; Gregory Kanter
>
> Subject: Patch ping - [PATCH] [A
Hi,
The patch is to fix _mm_[u]comixx_{ss,sd} codegen and add PF result. These
intrinsics have changed over time, like `_mm_comieq_ss ` old operation is
`RETURN ( a[31:0] == b[31:0] ) ? 1 : 0`, and new operation update is `RETURN (
a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] == b[31:0] ) ? 1
Hi,
The patch is enable __bf16 scalar type for target sse2 and above according to
psABI(https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/35/diffs).
The __bf16 type is a storage type like arm.
OK for master?
gcc/ChangeLog:
* config/i386/i386-builtin-types.def (BFLOAT16): New pr
Hi,
Old patch has some mistake in `*movbf_internal` , now disable BFmode constant
double move in `*movbf_internal`.
Thanks,
Lingling
> -Original Message-
> From: Kong, Lingling
> Sent: Tuesday, July 26, 2022 9:31 AM
> To: Liu, Hongtao ; gcc-patches@gcc.gnu.org
> Cc:
Hi,
For _mm512_fpclass_ps_mask in O0, mask should be (__mmask16)-1 instead of
(__mmask8)-1).
Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
Ok for master?
gcc/ChangeLog:
* gcc/config/i386/avx512dqintrin.h : fix _mm512_fpclass_ps_mask define in O0
gcc/testsuite/ChangeLog:
* gcc.target/
Hi,
For avx512f_scattersi, mask operand only affect set src, we
need to refine the pattern to let gcc know mask register also affect the dest.
So we put mask operand into UNSPEC_VSIBADDR.
Bootstrapped and regression tested on x86_64-linux-gnu{-m32,-m64}.
Ok for master?
gcc/ChangeLog:
*config/i3
Hi,
For avx512f_scattersi, mask operand only affect set src, we need to
refine the pattern to let gcc know mask register also affect the dest.
So we put mask operand into UNSPEC_VSIBADDR.
Bootstrapped and regression tested on x86_64-linux-gnu{-m32,-m64}.
Ok for master?
gcc/ChangeLog:
P
Hi
The patch is to fix vec_init_dup_v16bf, add correct handle for v16bf mode in
ix86_expand_vector_init_duplicate.
Add testcase with sse2 without avx2.
OK for master?
gcc/ChangeLog:
PR target/106887
* config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate):
Fixe
it in new patch. Thanks.
Ok for master ?
Thanks,
Lingling
> -Original Message-
> From: Richard Biener
> Sent: Wednesday, September 14, 2022 4:16 PM
> To: Kong, Lingling
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao
> Subject: Re: [PATCH] Enhance final_value_replaceme
anks again for take a look.
OK for master ?
Thanks,
Lingling
> -Original Message-
> From: Hongtao Liu
> Sent: Thursday, September 15, 2022 11:46 AM
> To: Kong, Lingling
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao
> Subject: Re: [PATCH] i386: Fixed vec_init_dup_v16bf
t; > ..
> > else if (tree_fits_uhwi_p (niter)
> > ... bitwise induction case...)
> > ...
> >
> Yes, I fixed it in new patch. Thanks.
> Ok for master ?
>
> Thanks,
> Lingling
>
> > -Original Message-
> > From: Richard Biener
Hi,
vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with
-mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c.
And cleared before conversion, updated movhi_internal and
ix86_can_change_mode_class.
OK for master?
gcc/ChangeLog:
PR target/102811
insn can optimize scalar load to a
vector.
Thanks,
Lingling
-Original Message-
From: Uros Bizjak
Sent: Wednesday, November 24, 2021 3:57 PM
To: Kong, Lingling
Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert
_Floa
Hi,
vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with
-mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c.
Cleared before conversion, updated movhi_internal and
ix86_can_change_mode_class. And fixed some commit message.
OK for master?
gcc/ChangeLog:
OK, This is the patch I prepare to check in.
-Original Message-
From: Uros Bizjak
Sent: Wednesday, November 24, 2021 4:49 PM
To: Kong, Lingling
Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert
_Float16 to SFmode
en Jiang via Gcc-patches
> wrote:
> >
> > From: Kong Lingling
> > +(define_insn "vbcstne2ps_"
> > + [(set (match_operand:VF1_128_256 0 "register_operand" "=x")
> > +(vec_duplicate:VF1_128_256
> > + (unspec:SF
> > +
Hi,
Previously we use unsigned short to represent bf16. It's not a good expression,
and at the time the front end didn't support bf16 type.
Now we introduced __bf16 to X86 psABI. So we can switch intrinsics to the new
type.
Ok for trunk ?
Thanks,
Lingling
gcc/ChangeLog:
* config/i386
ctober 25, 2022 1:23 PM
> To: Kong, Lingling
> Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org; Jiang,
> Haochen
> Subject: Re: [PATCH 4/6] Support Intel AVX-NE-CONVERT
>
> On Mon, Oct 24, 2022 at 2:20 PM Kong, Lingling
> wrote:
> >
> > > From: Gcc-patches
> >
Hi
The patch is for mention Intel __bf16 support in AVX512BF16 intrinsics.
Ok for master ?
Thanks,
Lingling
---
htdocs/gcc-13/changes.html | 2 ++
1 file changed, 2 insertions(+)
diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index
7c6bfa6e..cd0282f1 100644
--- a/htdocs/
> > diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> > index 7c6bfa6e..cd0282f1 100644
> > --- a/htdocs/gcc-13/changes.html
> > +++ b/htdocs/gcc-13/changes.html
> > @@ -230,6 +230,8 @@ a work-in-progress.
> >For both C and C++ the __bf16 type is supported on
> >x86
> > > diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> > > index 7c6bfa6e..cd0282f1 100644
> > > --- a/htdocs/gcc-13/changes.html
> > > +++ b/htdocs/gcc-13/changes.html
> > > @@ -230,6 +230,8 @@ a work-in-progress.
> > >For both C and C++ the __bf16 type is supported on
>
Hi,
The patch is support vector init/broadcast/set/extract for __bf16 type.
The __bf16 type is a storage type.
OK for master?
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle vector
BFmode.
(ix86_expand_vector_init_duplicate): Support vector BF
Hi,
This patch is for pr105735/pr101991. It will enable below optimization:
{
- long unsigned int bit;
-
- [local count: 32534376]:
-
- [local count: 1041207449]:
- # tmp_10 = PHI
- # bit_12 = PHI
- tmp_7 = bit2_6(D) & tmp_10;
- bit_8 = bit_12 + 1;
- if (bit_8 != 32)
-goto ; [96.97
Hi
The patch is for mention Intel __bf16 support in gcc13.
Ok for master ?
Thanks,
Lingling
htdocs/gcc-13/changes.html | 7 ++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index
57bd8724..7d98329c 100644
--- a/htdocs/g
Hi Richard, could you help to have a look for the patch ?
> Hi,
>
> This patch is for pr105735/pr101991. It will enable below optimization:
> {
> - long unsigned int bit;
> -
> - [local count: 32534376]:
> -
> - [local count: 1041207449]:
> - # tmp_10 = PHI
> - # bit_12 = PHI
> - tmp
Hi,
The conditional mult reduction cannot be recognized with current GCC. The
following loop cannot be vectorized.
Now add MULT_EXPR recognition for conditional scalar reduction.
float summa(int n, float *arg1, float *arg2)
{
int i;
Hi,
Handle E_V8BFmode in expand_vec_perm_broadcast_1 and
ix86_expand_vector_init_duplicate.
Ok for trunk?
gcc/ChangeLog:
PR target/106742
* config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate):
Handle V8BF mode.
(expand_vec_perm_broadcast_1): Ditto.
gc
Hi Richard, could you help to have a look for the patch ?
Ok for master ?
> Hi,
>
> The conditional mult reduction cannot be recognized with current GCC. The
> following loop cannot be vectorized.
> Now add MULT_EXPR recognition for conditional scalar reduction.
>
> float summa(int n, float *
Hi,
I fixed it in a new patch. And added BF vector mode in SUBST_V and
avx512fmaskhalfmode for @vec_interleave_high.
Ok for trunk ?
> > Hi,
> >
> > Handle E_V8BFmode in expand_vec_perm_broadcast_1 and
> ix86_expand_vector_init_duplicate.
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> >
> + if ((bitinv_def
>
> please use else if here
Sorry, If use the else if here, there is no corresponding above if. I'm not
sure if you mean change bitwise induction expression if to else if.
Do you agree with these changes? Thanks again for taking a look.
Thanks,
Lingling
>
Hi,
This patch is to support cmla_optab, cmul_optab, cmla_conj_optab,
cmul_conj_optab for vector _Float16.
Ok for master?
gcc/ChangeLog:
* config/i386/sse.md (cmul3): add new define_expand.
(cmla4): Likewise
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512fp16-vector-
Hi,
This patch is to support fold _mm512_fmadd_pch (a, _mm512_set1_pch(*(b)), c) to
1 instruction vfmaddcph (%rsp){1to16}, %zmm1, %zmm2.
OK for master?
gcc/ChangeLog:
* config/i386/sse.md (fma___pair):
Add new define_insn.
(fma__fmaddc_bcst): Add new define_insn_and_spli
Hi,
vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with
-mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c.
OK for master?
gcc/ChangeLog:
PR target/102811
* config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for f16c.
(extendh
Hi,
This patch is to add alias for f*mul_*ch intrinsics.
Ok for master?
gcc/ChangeLog:
* config/i386/avx512fp16intrin.h (_mm512_mul_pch): Add alias for
_mm512_fmul_pch.
(_mm512_mask_mul_pch): Likewise.
(_mm512_maskz_mul_pch): Likewise.
(_mm512_mul_round_pch): L
1 - 100 of 108 matches
Mail list logo