[PATCH] i386: Support APX NF and NDD for imul/mul

2024-07-01 Thread kong lingling
Add some missing APX NF and NDD support for imul and mul. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: * config/i386/i386.md (*imulhizu): Added APX NF support. (*imulhizu): New define_insn. (*mulsi3_1_zext): Ditto.

[PATCH] i386: Remove report error for -mapxf/-muintr with -m32

2024-07-17 Thread kong lingling
Also add some comment for list cpuid are not supported in 32 bit. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready push to trunk. gcc/ChangeLog: * config/i386/i386-options.cc (ix86_option_override_internal): Remove compiler report error for -mapxf or -muintr with

RE: [PATCH] i386: Remove report error for -mapxf/-muintr with -m32

2024-07-17 Thread Kong, Lingling
On Thu, Jul 18, 2024, 10:00 AM kong lingling mailto:lingling.ko...@gmail.com>> wrote: Also add some comment for list cpuid are not supported in 32 bit. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready push to trunk. gcc/ChangeLog: * config/i386/i386-opti

[PATCH] x86: Don't enable APX_F in 32-bit mode.

2024-07-18 Thread Kong, Lingling
I adjusted my patch based on the comments by H.J. And I will add the testcase like gcc.target/i386/pr101395-1.c when the march for APX is determined. Ok for trunk? Thanks, Lingling gcc/ChangeLog: PR target/115978 * config/i386/driver-i386.cc (host_detect_local_cpu): Enable

[PATCH 1/8] [APX NF]: Support APX NF add

2024-05-15 Thread Kong, Lingling
From: Hongyu Wang APX NF(no flags) feature implements suppresses the update of status flags for arithmetic operations. For NF add, it is not clear whether NF add can be faster than lea. If so, the pattern needs to be adjusted to prefer LEA generation. gcc/ChangeLog: * config/i386/i38

[PATCH 2/8] [APX NF] Support APX NF for {sub/and/or/xor/neg}

2024-05-15 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (*sub_1_nf): New define_insn. (*anddi_1_nf): Ditto. (*and_1_nf): Ditto. (*qi_1_nf): Ditto. (*_1_nf): Ditto. (*neg_1_nf): Ditto. * config/i386/sse.md : New define_split. gcc/testsuite/ChangeLog: *

[PATCH 3/8] [APX NF] Support APX NF for left shift insns

2024-05-15 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (*ashl3_1_nf): New. (*ashlhi3_1_nf): Ditto. (*ashlqi3_1_nf): Ditto. * config/i386/sse.md: New define_split. --- gcc/config/i386/i386.md | 175 gcc/config/i386/sse.md | 13 +++ 2 files c

[PATCH 4/8] [APX NF] Support APX NF for right shift insns

2024-05-15 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (*ashr3_1_nf): New. (*lshr3_1_nf): Ditto. (*lshrqi3_1_nf): Ditto. (*lshrhi3_1_nf): Ditto. --- gcc/config/i386/i386.md | 85 + 1 file changed, 85 insertions(+) diff --git a/gcc/config/i386

[PATCH 5/8] [APX NF] Support APX NF for rotate insns

2024-05-15 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (ashr3_cvt_nf): New define_insn. (*3_1_nf): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-nf.c: Add NF test for rotate insns. --- gcc/config/i386/i386.md| 80 ++ gcc/testsuite/gcc.target

[PATCH 7/8] [APX NF] Support APX NF for mul/div

2024-05-15 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (*mul3_1_nf): New define_insn. (*mulqi3_1_nf): Ditto. (*divmod4_noext_nf): Ditto. (divmodhiqi3_nf): Ditto. --- gcc/config/i386/i386.md | 86 + 1 file changed, 86 insertions(+) diff --git

[PATCH 6/8] [APX NF] Support APX NF for shld/shrd

2024-05-15 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (x86_64_shld_nf): New define_insn. (x86_64_shld_ndd_nf): Ditto. (x86_64_shld_1_nf): Ditto. (x86_64_shld_ndd_1_nf): Ditto. (*x86_64_shld_shrd_1_nozext_nf): Ditto. (x86_shld_nf): Ditto. (x86_shld_ndd_nf): Di

[PATCH 8/8] [APX NF] Support APX NF for lzcnt/tzcnt/popcnt

2024-05-15 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (clz2_lzcnt_nf): New define_insn. (*clz2_lzcnt_falsedep_nf): Ditto. (__nf): Ditto. (*__falsedep_nf): Ditto. (_hi_nf): Ditto. (popcount2_nf): Ditto. (*popcount2_falsedep_nf): Ditto. (popcounthi2_nf)

RE: [PATCH 1/8] [APX NF]: Support APX NF add

2024-05-15 Thread Kong, Lingling
> -Original Message- > From: Uros Bizjak > Sent: Wednesday, May 15, 2024 4:15 PM > To: Kong, Lingling > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; Wang, > Hongyu > Subject: Re: [PATCH 1/8] [APX NF]: Support APX NF add > > On Wed, May 15, 2024 at 9:43

[PATCH v2 1/8] [APX NF]: Support APX NF add

2024-05-22 Thread Kong, Lingling
> I wonder if we can use "define_subst" to conditionally add flags clobber > for !TARGET_APX_NF targets. Even the example for "Define Subst" uses the insn > w/ and w/o the clobber, so I think it is worth considering this approach. > > Uros. Good Suggestion, I defined new subst for no flags, and B

[PATCH v2 2/8] [APX NF] Support APX NF for {sub/and/or/xor/neg}

2024-05-22 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (nf_and_applied): New subst_attr. (nf_x64_and_applied): Ditto. (*sub_1_nf): New define_insn. (*anddi_1_nf): Ditto. (*and_1_nf): Ditto. (*qi_1_nf): Ditto. (*

[PATCH v2 3/8] [APX NF] Support APX NF for left shift insns

2024-05-22 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (*ashl3_1_nf): New. (*ashlhi3_1_nf): Ditto. (*ashlqi3_1_nf): Ditto. * config/i386/sse.md: New define_split. --- gcc/config/i386/i386.md | 80 +++-- gcc/config/i386/sse.md | 13 +++ 2 file

[PATCH v2 4/8] [APX NF] Support APX NF for right shift insns

2024-05-22 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (*ashr3_1_nf): New. (*lshr3_1_nf): Ditto. (*lshrqi3_1_nf): Ditto. (*lshrhi3_1_nf): Ditto. --- gcc/config/i386/i386.md | 82 +++-- 1 file changed, 46 insertions(+), 36 deletions(-) diff --git

[PATCH v2 5/8] [APX NF] Support APX NF for rotate insns

2024-05-22 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (ashr3_cvt_nf): New define_insn. (*3_1_nf): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-nf.c: Add NF test for rotate insns. --- gcc/config/i386/i386.md| 53 -- gcc/testsuite/gcc.target

[PATCH v2 6/8] [APX NF] Support APX NF for shld/shrd

2024-05-22 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (x86_64_shld_nf): New define_insn. (x86_64_shld_ndd_nf): Ditto. (x86_64_shld_1_nf): Ditto. (x86_64_shld_ndd_1_nf): Ditto. (*x86_64_shld_shrd_1_nozext_nf): Ditto. (x86_shld_nf): Ditto. (x86_shld_ndd_nf): Di

[PATCH v2 7/8] [APX NF] Support APX NF for mul/div

2024-05-22 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (*mul3_1_nf): New define_insn. (*mulqi3_1_nf): Ditto. (*divmod4_noext_nf): Ditto. (divmodhiqi3_nf): Ditto. --- gcc/config/i386/i386.md | 47 ++--- 1 file changed, 30 insertions(+), 17 deletion

[PATCH v2 8/8] [APX NF] Support APX NF for lzcnt/tzcnt/popcnt

2024-05-22 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (clz2_lzcnt_nf): New define_insn. (*clz2_lzcnt_falsedep_nf): Ditto. (__nf): Ditto. (*__falsedep_nf): Ditto. (_hi_nf): Ditto. (popcount2_nf): Ditto. (*popcount2_falsedep_nf): Ditto. (popcounthi2_nf)

RE: [PATCH v2 2/8] [APX NF] Support APX NF for {sub/and/or/xor/neg}

2024-05-22 Thread Kong, Lingling
Cc Uros. From: Kong, Lingling Sent: Wednesday, May 22, 2024 4:35 PM To: gcc-patches@gcc.gnu.org Cc: Liu, Hongtao ; Kong, Lingling Subject: [PATCH v2 2/8] [APX NF] Support APX NF for {sub/and/or/xor/neg} gcc/ChangeLog: * config/i386/i386.md (nf_and_applied): New subst_attr

[PATCH v3 1/8] [APX NF]: Support APX NF add

2024-05-28 Thread Kong, Lingling
Hi, compared with v2, these patches restored the original lea patten position and addressed hongtao's comment. APX NF(no flags) feature implements suppresses the update of status flags for arithmetic operations. For NF add, it is not clear whether nf add can be faster than lea. If so, the patte

[PATCH v3 3/8] [APX NF] Support APX NF for left shift insns

2024-05-28 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (*ashl3_1_nf): New. (*ashlhi3_1_nf): Ditto. (*ashlqi3_1_nf): Ditto. * config/i386/sse.md: New define_split. --- gcc/config/i386/i386.md | 96 ++--- gcc/config/i386/sse.md | 13 ++ 2 files

[PATCH v3 2/8] [APX NF] Support APX NF for {sub/and/or/xor/neg}

2024-05-28 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (nf_nonf_attr): New subst_attr. (nf_nonf_x64_attr): Ditto. (*sub_1_nf): New define_insn. (*anddi_1_nf): Ditto. (*and_1_nf): Ditto. (*qi_1_nf): Ditto. (*_1_nf): Ditto. (*neg_1_nf): Ditto. *

[PATCH v3 5/8] [APX NF] Support APX NF for rotate insns

2024-05-28 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (ashr3_cvt_nf): New define_insn. (*3_1_nf): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-nf.c: Add NF test for rotate insns. --- gcc/config/i386/i386.md| 59 +- gcc/testsuite/gcc.target

[PATCH v3 8/8] [APX NF] Support APX NF for lzcnt/tzcnt/popcnt

2024-05-28 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (clz2_lzcnt_nf): New define_insn. (*clz2_lzcnt_falsedep_nf): Ditto. (__nf): Ditto. (*__falsedep_nf): Ditto. (_hi_nf): Ditto. (popcount2_nf): Ditto. (*popcount2_falsedep_nf): Ditto. (popcounthi2_nf)

[PATCH v3 4/8] [APX NF] Support APX NF for right shift insns

2024-05-28 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (*ashr3_1_nf): New. (*lshr3_1_nf): Ditto. (*lshrqi3_1_nf): Ditto. (*lshrhi3_1_nf): Ditto. --- gcc/config/i386/i386.md | 82 +++-- 1 file changed, 46 insertions(+), 36 deletions(-) diff --git

[PATCH v3 7/8] [APX NF] Support APX NF for mul/div

2024-05-28 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (*mul3_1_nf): New define_insn. (*mulqi3_1_nf): Ditto. (*divmod4_noext_nf): Ditto. (divmodhiqi3_nf): Ditto. --- gcc/config/i386/i386.md | 47 ++--- 1 file changed, 30 insertions(+), 17 deletion

[PATCH v3 6/8] [APX NF] Support APX NF for shld/shrd

2024-05-28 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (x86_64_shld_nf): New define_insn. (x86_64_shld_ndd_nf): Ditto. (x86_64_shld_1_nf): Ditto. (x86_64_shld_ndd_1_nf): Ditto. (*x86_64_shld_shrd_1_nozext_nf): Ditto. (x86_shld_nf): Ditto. (x86_shld_ndd_nf): Di

PING [PATCH v2 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass

2024-08-08 Thread Kong, Lingling
Hi, Gently ping. Thanks, Lingling From: Kong, Lingling Sent: Tuesday, June 25, 2024 2:46 PM To: gcc-patches@gcc.gnu.org Cc: Alexander Monakov ; Uros Bizjak ; lingling.ko...@gmail.com; Hongtao Liu ; Jeff Law ; Richard Biener Subject: RE: [PATCH v2 1/2] [APX CFCMOV] Support APX CFCMOV in

[PATCH 1/4] i386: Optimization for APX NDD is always zero-uppered for ADD

2024-08-12 Thread kong lingling
For APX instruction with an NDD, the destination GPR will get the instruction’s result in bits [OSIZE-1:0] and, if OSIZE < 64b, have its upper bits [63:OSIZE] zeroed. Now supporting other NDD instructions. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog:

[PATCH 2/4] i386: Optimization for APX NDD is always zero-uppered for sub/adc/sbb

2024-08-12 Thread kong lingling
gcc/ChangeLog: PR target/113729 * config/i386/i386.md (*subqi_1_zext): New define_insn. (*subhi_1_zext): Ditto. (*addqi3_carry_zext): Ditto. (*addhi3_carry_zext): Ditto. (*addqi3_carry_

[PATCH 3/4] i386: Optimization for APX NDD is always zero-uppered for logic

2024-08-12 Thread kong lingling
gcc/ChangeLog: PR target/113729 * config/i386/i386.md (*andqi_1_zext): New define_insn. (*andhi_1_zext): Ditto. (*qi_1_zext): Ditto. (*hi_1_zext): Ditto. (*negqi_1_zext): Ditto.

[PATCH 4/4] i386: Optimization for APX NDD is always zero-uppered for shift

2024-08-12 Thread kong lingling
gcc/ChangeLog: PR target/113729 * config/i386/i386.md (*ashlqi3_1_zext): New define_insn. (*ashlhi3_1_zext): Ditto. (*qi3_1_zext): Ditto. (*hi3_1_zext): Ditto. (*qi3_1_zext): Ditto.

RE: [PATCH 1/4] i386: Optimization for APX NDD is always zero-uppered for ADD

2024-08-13 Thread Kong, Lingling
Hi, Gently ping. Thanks, Lingling From: kong lingling Sent: Monday, August 12, 2024 3:10 PM To: gcc-patches@gcc.gnu.org Cc: H. J. Lu ; Kong, Lingling ; Liu, Hongtao Subject: [PATCH 1/4] i386: Optimization for APX NDD is always zero-uppered for ADD For APX instruction with an NDD, the

[PATCH] i386: Fix some vex insns that prohibit egpr

2024-08-13 Thread Kong, Lingling
Although these vex insn have evex counterpart, but when it uses the displayed vex prefix should not support APX EGPR. Like TARGET_AVXVNNI, TARGET_IFMA and TARGET_AVXNECONVERT. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: * config/i386/sse.md (vp

[PATCH v2] i386: Fix some vex insns that prohibit egpr

2024-08-14 Thread Kong, Lingling
-Original Message- From: Kong, Lingling Sent: Wednesday, August 14, 2024 4:20 PM To: Kong, Lingling Subject: [PATCH v2] i386: Fix some vex insns that prohibit egpr Although these vex insn have evex counterpart, but when it uses the displayed vex prefix should not support APX EGPR

[PATCH] [APX ZU] Support APX zero-upper

2024-06-06 Thread Kong, Lingling
Enable ZU for IMUL (opcodes 0x69 and 0x6B) and SETcc. gcc/ChangeLog: * config/i386/i386-opts.h (enum apx_features):Add apx_zu. * config/i386/i386.h (TARGET_APX_ZU): Define. * config/i386/i386.md (*imulhizu): New define_insn. (*setcc__zu): Ditto. * config/i3

[PATCH 1/2] Add a new target hook: TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP

2024-06-13 Thread Kong, Lingling
From: konglin1 gcc/ChangeLog: * doc/tm.texi: Regenerated. * doc/tm.texi.in: Add TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP * target.def (bool,): New hook. * targhooks.cc (default_have_conditional_move_mem_notrap): New function to hook TARGET_HAVE_CONDITIONAL_

[PATCH 2/2] [APX CFCMOV] Support APX CFCMOV

2024-06-13 Thread Kong, Lingling
From: konglin1 mailto:lingling.k...@intel.com>> APX CFCMOV feature implements conditionally faulting which means that all memory faults are suppressed when the condition code evaluates to false and load or store a memory operand. Now we could load or store a memory operand may trap or fault f

[PATCH 0/3] [APX CFCMOV] Support APX CFCMOV

2024-06-13 Thread Kong, Lingling
APX CFCMOV[1] feature implements conditionally faulting which means that all memory faults are suppressed when the condition code evaluates to false and load or store a memory operand. Now we could load or store a memory operand may trap or fault for conditional move. In middle-end, now we don't

[PATCH 1/3] [APX CFCMOV] Add a new target hook: TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP

2024-06-13 Thread Kong, Lingling
From: konglin1 APX CFCMOV feature implements conditionally faulting which means that all memory faults are suppressed when the condition code evaluates to false and load or store a memory operand. Now we could load or store a memory operand may trap or fault for conditional move. In middle-end,

[PATCH 2/3] [APX CFCMOV] Support APX CFCMOV in if_convert pass

2024-06-13 Thread Kong, Lingling
From: Lingling Kong After added target HOOK TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP, we could support a conditional move that load or store mem may trap or fault in if convert pass. Conditional move suppress fault for conditional mem store would not move any arithmetic calculations. For conditio

[PATCH 3/3] [APX CFCMOV] Support APX CFCMOV in backend

2024-06-13 Thread Kong, Lingling
From: Lingling Kong Handle target hook TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP and support CFCMOV in backend. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_can_cfcmov_p): New function that test if the cfcmov can be generated. (ix86_expand_int_movcc): Expand to cfcmo

[PATCH Committed][APX ZU] Fix test for target-support check

2024-06-17 Thread Kong, Lingling
Fix test for APX ZU. Add attribute for no-inline and target APX, and target-support check. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Committed as an obvious patch. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-zu-1.c: Add attribute for noinline,

[PATCH v2 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass

2024-06-18 Thread Kong, Lingling
APX CFCMOV feature implements conditionally faulting which means that all memory faults are suppressed when the condition code evaluates to false and load or store a memory operand. Now we could load or store a memory operand may trap or fault for conditional move. In middle-end, now we don'

[PATCH v2 0/2] [APX CFCMOV] Support APX CFCMOV

2024-06-18 Thread Kong, Lingling
deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c -- > -Original Message- > From: Hongtao Liu > Sent: Monday, June 17, 2024 11:05 AM > To: Jeff Law > Cc: Alexander Monakov ; Kong, L

[PATCH v2 2/2] [APX CFCMOV] Support APX CFCMOV in backend

2024-06-18 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_can_cfcmov_p): New function that test if the cfcmov can be generated. (ix86_expand_int_movcc): Expand to cfcmov pattern if ix86_can_cfcmov_p return ture. * config/i386/i386-opts.h (enum apx_features): Add apx

RE: [PATCH v2 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass

2024-06-24 Thread Kong, Lingling
Hi, Gently ping for this. This version has removed the target hook and added a new optab for cfcmov. Thanks, Lingling From: Kong, Lingling Sent: Tuesday, June 18, 2024 3:41 PM To: gcc-patches@gcc.gnu.org Cc: Alexander Monakov ; Uros Bizjak ; lingling.ko...@gmail.com; Hongtao Liu ; Jeff Law

RE: [PATCH] i386: Change prefetchi output template

2024-07-22 Thread Kong, Lingling
> -Original Message- > From: Haochen Jiang > Sent: Monday, July 22, 2024 2:41 PM > To: gcc-patches@gcc.gnu.org > Cc: Liu, Hongtao ; ubiz...@gmail.com > Subject: [PATCH] i386: Change prefetchi output template > > Hi all, > > For prefetchi instructions, RIP-relative address is explicitl

[PATCH] i386: Adjust rtx cost for imulq and imulw [PR115749]

2024-07-24 Thread Kong, Lingling
Tested spec2017 performance in Sierra Forest, Icelake, CascadeLake, at least there is no obvious regression. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. OK for trunk? gcc/ChangeLog: * config/i386/x86-tune-costs.h (struct processor_costs): Adjust rtx_cost of imulq

[PATCH] i386: Remove ndd support for *add_4 [PR113744]

2024-07-30 Thread Kong, Lingling
*add_4 and *adddi_4 are for shorter opcode from cmp to inc/dec or add $128. But NDD code is longer than the cmp code, so there is no need to support NDD. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: PR target/113744 * con

[PATCH] i386: Fix memory constraint for APX NF

2024-07-31 Thread Kong, Lingling
The je constraint should be used for APX NDD ADD with register source operand. The jM is for APX NDD patterns with immediate operand. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: * config/i386/i386.md (nf_mem_constraint): Fixed the constraint

RE: [PATCH] i386: Fix memory constraint for APX NF

2024-07-31 Thread Kong, Lingling
> -Original Message- > From: Liu, Hongtao > Sent: Thursday, August 1, 2024 9:35 AM > To: Kong, Lingling ; gcc-patches@gcc.gnu.org > Cc: Wang, Hongyu > Subject: RE: [PATCH] i386: Fix memory constraint for APX NF > > > > > -Original Message- &

[PATCH] i386: Fix comment/naming for APX NDD constraints

2024-08-01 Thread Kong, Lingling
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: * config/i386/constraints.md: Fixed the comment/naming for je/jM/jO. * config/i386/predicates.md (apx_ndd_memory_operand): Renamed and fixed the comment. (apx_evex_memory

[PATCH] x86: Fix cmov cost model issue [PR109549]

2024-05-05 Thread Kong, Lingling
Hi, (if_then_else:SI (eq (reg:CCZ 17 flags) (const_int 0 [0])) (reg/v:SI 101 [ e ]) (reg:SI 102)) The cost is 8 for the rtx, the cost for (eq (reg:CCZ 17 flags) (const_int 0 [0])) is 4, but this is just an operator do not need to compute it's cost in cmov. Bootstrapped and regtest

[PATCH] i386: fix ix86_hardreg_mov_ok with lra_in_progress

2024-05-06 Thread Kong, Lingling
Hi, Originally eliminate_regs_in_insn will transform (parallel [ (set (reg:QI 130) (plus:QI (subreg:QI (reg:DI 19 frame) 0) (const_int 96))) (clobber (reg:CC 17 flag))]) {*addqi_1} to (set (reg:QI 130) (subreg:QI (reg:DI 19 frame) 0)) {*movqi_internal} when verify_changes. But

[PATCH v3 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass

2024-09-05 Thread Kong, Lingling
Hi, This version has added a new optab named 'cfmovcc'. The new optab is used in the middle end to expand to cfcmov. And simplified my patch by trying to generate the conditional faulting movcc in noce_try_cmove_arith function. All the changes passed bootstrap & regtest x86-64-pc-linux-gnu. We al

[PATCH v3 2/2] [APX CFCMOV] Support APX CFCMOV in backend

2024-09-05 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_can_cfcmov_p): New func that test if the cfcmov can be generated. (ix86_expand_int_cfmovcc): Expand to cfcmov pattern. * config/i386/i386-opts.h (enum apx_features): New. *

RE: [PATCH v3 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass

2024-09-11 Thread Kong, Lingling
> -Original Message- > From: Richard Sandiford > Sent: Friday, September 6, 2024 5:19 PM > To: Kong, Lingling > Cc: gcc-patches@gcc.gnu.org; Jeff Law ; Richard Biener > ; Uros Bizjak ; Hongtao Liu > ; Jakub Jelinek > Subject: Re: [PATCH v3 1/2] [APX CFCMO

[PATCH] i386: Fix scalar VCOMSBF16 which only compares low word

2024-10-09 Thread Kong, Lingling
Hi, Fixed scalar VCOMSBF16 misused in AVX10.2. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m64}. Ok for trunk? gcc/ChangeLog: * config/i386/sse.md (avx10_2_comsbf16_v8bf): Fixed scalar operands. --- gcc/config/i386/sse.md | 8 ++-- 1 file changed, 6 insertions(+), 2

RE: [PATCH v3 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass

2024-09-19 Thread Kong, Lingling
> > "Kong, Lingling" writes: > > > Hi, > > > > > > This version has added a new optab named 'cfmovcc'. The new optab is > > > used in the middle end to expand to cfcmov. And simplified my patch > > > by trying to generate th

[PATCH] i386: Update the comment for mapxf option

2024-09-18 Thread Kong, Lingling
Hi, After APX NF, CCMP and NF features supported, the comment for APX option also need update. Ok for trunk? gcc/ChangeLog: * config/i386/i386.opt: Update the features included in apxf. --- gcc/config/i386/i386.opt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/g

[PATCH v4 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass

2024-11-13 Thread Kong, Lingling
Hi, Many thanks to Richard for the suggestion that conditional load is like a scalar instance of maskload_optab . So this version has use maskload and maskstore optab to expand and generate cfcmov in ifcvt pass. All the changes passed bootstrap & regtest x86-64-pc-linux-gnu. We also tested spec

[PATCH v4 2/2] [APX CFCMOV] Support APX CFCMOV in backend

2024-11-13 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_int_cfmovcc): Expand to cfcmov pattern. * config/i386/i386-opts.h (enum apx_features): New. * config/i386/i386-protos.h (ix86_expand_int_cfmovcc): Define. * config/i386/i386.cc (ix86_rtx_costs): Add U

RE: Patch ping - [PATCH] [APX EGPR] Fix indirect call prefix

2024-11-24 Thread Kong, Lingling
Hi, LGTM. Now Hongyu and Hongtao are working on APX. Thanks, Lingling > -Original Message- > From: Gregory Kanter > Sent: Saturday, November 23, 2024 8:16 AM > To: gcc-patches@gcc.gnu.org > Cc: Kong, Lingling ; Gregory Kanter > > Subject: Patch ping - [PATCH] [A

[PATCH] i386: Fix _mm_[u]comixx_{ss,sd} codegen and add PF result. [PR106113]

2022-07-13 Thread Kong, Lingling via Gcc-patches
Hi, The patch is to fix _mm_[u]comixx_{ss,sd} codegen and add PF result. These intrinsics have changed over time, like `_mm_comieq_ss ` old operation is `RETURN ( a[31:0] == b[31:0] ) ? 1 : 0`, and new operation update is `RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] == b[31:0] ) ? 1

[PATCH] x86: Enable __bf16 type for TARGET_SSE2 and above

2022-07-25 Thread Kong, Lingling via Gcc-patches
Hi, The patch is enable __bf16 scalar type for target sse2 and above according to psABI(https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/35/diffs). The __bf16 type is a storage type like arm. OK for master? gcc/ChangeLog: * config/i386/i386-builtin-types.def (BFLOAT16): New pr

RE: [PATCH] x86: Enable __bf16 type for TARGET_SSE2 and above

2022-08-03 Thread Kong, Lingling via Gcc-patches
Hi, Old patch has some mistake in `*movbf_internal` , now disable BFmode constant double move in `*movbf_internal`. Thanks, Lingling > -Original Message- > From: Kong, Lingling > Sent: Tuesday, July 26, 2022 9:31 AM > To: Liu, Hongtao ; gcc-patches@gcc.gnu.org > Cc:

[PATCH] i386: Fix _mm512_fpclass_ps_mask in O0 [PR 101471]

2021-08-24 Thread Kong, Lingling via Gcc-patches
Hi, For _mm512_fpclass_ps_mask in O0, mask should be (__mmask16)-1 instead of (__mmask8)-1). Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. Ok for master? gcc/ChangeLog: * gcc/config/i386/avx512dqintrin.h : fix _mm512_fpclass_ps_mask define in O0 gcc/testsuite/ChangeLog: * gcc.target/

[PATCH] i386: Fix wrong optimization for consecutive masked scatters [PR 101472]

2021-08-24 Thread Kong, Lingling via Gcc-patches
Hi, For avx512f_scattersi, mask operand only affect set src, we need to refine the pattern to let gcc know mask register also affect the dest. So we put mask operand into UNSPEC_VSIBADDR. Bootstrapped and regression tested on x86_64-linux-gnu{-m32,-m64}. Ok for master? gcc/ChangeLog: *config/i3

[PATCH] i386: Fix wrong optimization for consecutive masked scatters [PR 101472]

2021-08-26 Thread Kong, Lingling via Gcc-patches
Hi, For avx512f_scattersi, mask operand only affect set src, we need to refine the pattern to let gcc know mask register also affect the dest. So we put mask operand into UNSPEC_VSIBADDR. Bootstrapped and regression tested on x86_64-linux-gnu{-m32,-m64}. Ok for master? gcc/ChangeLog: P

[PATCH] i386: Fixed vec_init_dup_v16bf [PR106887]

2022-09-14 Thread Kong, Lingling via Gcc-patches
Hi The patch is to fix vec_init_dup_v16bf, add correct handle for v16bf mode in ix86_expand_vector_init_duplicate. Add testcase with sse2 without avx2. OK for master? gcc/ChangeLog: PR target/106887 * config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate): Fixe

RE: [PATCH] Enhance final_value_replacement_loop to handle bitop with an invariant induction.[PR105735]

2022-09-15 Thread Kong, Lingling via Gcc-patches
it in new patch. Thanks. Ok for master ? Thanks, Lingling > -Original Message- > From: Richard Biener > Sent: Wednesday, September 14, 2022 4:16 PM > To: Kong, Lingling > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao > Subject: Re: [PATCH] Enhance final_value_replaceme

RE: [PATCH] i386: Fixed vec_init_dup_v16bf [PR106887]

2022-09-16 Thread Kong, Lingling via Gcc-patches
anks again for take a look. OK for master ? Thanks, Lingling > -Original Message- > From: Hongtao Liu > Sent: Thursday, September 15, 2022 11:46 AM > To: Kong, Lingling > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao > Subject: Re: [PATCH] i386: Fixed vec_init_dup_v16bf

RE: [PATCH] Enhance final_value_replacement_loop to handle bitop with an invariant induction.[PR105735]

2022-09-19 Thread Kong, Lingling via Gcc-patches
t; > .. > > else if (tree_fits_uhwi_p (niter) > > ... bitwise induction case...) > > ... > > > Yes, I fixed it in new patch. Thanks. > Ok for master ? > > Thanks, > Lingling > > > -Original Message- > > From: Richard Biener

RE: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-23 Thread Kong, Lingling via Gcc-patches
Hi, vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c. And cleared before conversion, updated movhi_internal and ix86_can_change_mode_class. OK for master? gcc/ChangeLog: PR target/102811

RE: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-24 Thread Kong, Lingling via Gcc-patches
insn can optimize scalar load to a vector. Thanks, Lingling -Original Message- From: Uros Bizjak Sent: Wednesday, November 24, 2021 3:57 PM To: Kong, Lingling Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Floa

[PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-24 Thread Kong, Lingling via Gcc-patches
Hi, vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c. Cleared before conversion, updated movhi_internal and ix86_can_change_mode_class. And fixed some commit message. OK for master? gcc/ChangeLog:

RE: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-24 Thread Kong, Lingling via Gcc-patches
OK, This is the patch I prepare to check in. -Original Message- From: Uros Bizjak Sent: Wednesday, November 24, 2021 4:49 PM To: Kong, Lingling Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode

RE: [PATCH 4/6] Support Intel AVX-NE-CONVERT

2022-10-23 Thread Kong, Lingling via Gcc-patches
en Jiang via Gcc-patches > wrote: > > > > From: Kong Lingling > > +(define_insn "vbcstne2ps_" > > + [(set (match_operand:VF1_128_256 0 "register_operand" "=x") > > +(vec_duplicate:VF1_128_256 > > + (unspec:SF > > +

[PATCH] i386: using __bf16 for AVX512BF16 intrinsics

2022-10-27 Thread Kong, Lingling via Gcc-patches
Hi, Previously we use unsigned short to represent bf16. It's not a good expression, and at the time the front end didn't support bf16 type. Now we introduced __bf16 to X86 psABI. So we can switch intrinsics to the new type. Ok for trunk ? Thanks, Lingling gcc/ChangeLog: * config/i386

RE: [PATCH 4/6] Support Intel AVX-NE-CONVERT

2022-10-28 Thread Kong, Lingling via Gcc-patches
ctober 25, 2022 1:23 PM > To: Kong, Lingling > Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org; Jiang, > Haochen > Subject: Re: [PATCH 4/6] Support Intel AVX-NE-CONVERT > > On Mon, Oct 24, 2022 at 2:20 PM Kong, Lingling > wrote: > > > > > From: Gcc-patches > >

[wwwdocs] [GCC13] Mention Intel __bf16 support in AVX512BF16 intrinsics.

2022-10-31 Thread Kong, Lingling via Gcc-patches
Hi The patch is for mention Intel __bf16 support in AVX512BF16 intrinsics. Ok for master ? Thanks, Lingling --- htdocs/gcc-13/changes.html | 2 ++ 1 file changed, 2 insertions(+) diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index 7c6bfa6e..cd0282f1 100644 --- a/htdocs/

RE: [wwwdocs] [GCC13] Mention Intel __bf16 support in AVX512BF16 intrinsics.

2022-11-01 Thread Kong, Lingling via Gcc-patches
> > diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html > > index 7c6bfa6e..cd0282f1 100644 > > --- a/htdocs/gcc-13/changes.html > > +++ b/htdocs/gcc-13/changes.html > > @@ -230,6 +230,8 @@ a work-in-progress. > >For both C and C++ the __bf16 type is supported on > >x86

RE: [wwwdocs] [GCC13] Mention Intel __bf16 support in AVX512BF16 intrinsics.

2022-11-02 Thread Kong, Lingling via Gcc-patches
> > > diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html > > > index 7c6bfa6e..cd0282f1 100644 > > > --- a/htdocs/gcc-13/changes.html > > > +++ b/htdocs/gcc-13/changes.html > > > @@ -230,6 +230,8 @@ a work-in-progress. > > >For both C and C++ the __bf16 type is supported on >

[PATCH] x86: Support vector __bf16 type.

2022-08-16 Thread Kong, Lingling via Gcc-patches
Hi, The patch is support vector init/broadcast/set/extract for __bf16 type. The __bf16 type is a storage type. OK for master? gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle vector BFmode. (ix86_expand_vector_init_duplicate): Support vector BF

[PATCH] Enhance final_value_replacement_loop to handle bitop with an invariant induction.[PR105735]

2022-08-17 Thread Kong, Lingling via Gcc-patches
Hi, This patch is for pr105735/pr101991. It will enable below optimization: { - long unsigned int bit; - - [local count: 32534376]: - - [local count: 1041207449]: - # tmp_10 = PHI - # bit_12 = PHI - tmp_7 = bit2_6(D) & tmp_10; - bit_8 = bit_12 + 1; - if (bit_8 != 32) -goto ; [96.97

[wwwdocs] [GCC13] Mention Intel __bf16 support.

2022-08-18 Thread Kong, Lingling via Gcc-patches
Hi The patch is for mention Intel __bf16 support in gcc13. Ok for master ? Thanks, Lingling htdocs/gcc-13/changes.html | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index 57bd8724..7d98329c 100644 --- a/htdocs/g

RE: [PATCH] Enhance final_value_replacement_loop to handle bitop with an invariant induction.[PR105735]

2022-08-22 Thread Kong, Lingling via Gcc-patches
Hi Richard, could you help to have a look for the patch ? > Hi, > > This patch is for pr105735/pr101991. It will enable below optimization: > { > - long unsigned int bit; > - > - [local count: 32534376]: > - > - [local count: 1041207449]: > - # tmp_10 = PHI > - # bit_12 = PHI > - tmp

[PATCH] middle-end: Add MULT_EXPR recognition for cond scalar reduction

2022-08-25 Thread Kong, Lingling via Gcc-patches
Hi, The conditional mult reduction cannot be recognized with current GCC. The following loop cannot be vectorized. Now add MULT_EXPR recognition for conditional scalar reduction. float summa(int n, float *arg1, float *arg2) { int i;

[PATCH] x86: Handle V8BF in expand_vec_perm_broadcast_1

2022-08-30 Thread Kong, Lingling via Gcc-patches
Hi, Handle E_V8BFmode in expand_vec_perm_broadcast_1 and ix86_expand_vector_init_duplicate. Ok for trunk? gcc/ChangeLog: PR target/106742 * config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate): Handle V8BF mode. (expand_vec_perm_broadcast_1): Ditto. gc

RE: [PATCH] middle-end: Add MULT_EXPR recognition for cond scalar reduction

2022-08-31 Thread Kong, Lingling via Gcc-patches
Hi Richard, could you help to have a look for the patch ? Ok for master ? > Hi, > > The conditional mult reduction cannot be recognized with current GCC. The > following loop cannot be vectorized. > Now add MULT_EXPR recognition for conditional scalar reduction. > > float summa(int n, float *

RE: [PATCH] x86: Handle V8BF in expand_vec_perm_broadcast_1

2022-09-02 Thread Kong, Lingling via Gcc-patches
Hi, I fixed it in a new patch. And added BF vector mode in SUBST_V and avx512fmaskhalfmode for @vec_interleave_high. Ok for trunk ? > > Hi, > > > > Handle E_V8BFmode in expand_vec_perm_broadcast_1 and > ix86_expand_vector_init_duplicate. > > Ok for trunk? > > > > gcc/ChangeLog: > > > >

RE: [PATCH] Enhance final_value_replacement_loop to handle bitop with an invariant induction.[PR105735]

2022-09-13 Thread Kong, Lingling via Gcc-patches
> + if ((bitinv_def > > please use else if here Sorry, If use the else if here, there is no corresponding above if. I'm not sure if you mean change bitwise induction expression if to else if. Do you agree with these changes? Thanks again for taking a look. Thanks, Lingling >

[PATCH] i386: Support complex fma/conj_fma for _Float16.

2021-11-05 Thread Kong, Lingling via Gcc-patches
Hi, This patch is to support cmla_optab, cmul_optab, cmla_conj_optab, cmul_conj_optab for vector _Float16. Ok for master? gcc/ChangeLog: * config/i386/sse.md (cmul3): add new define_expand. (cmla4): Likewise gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vector-

[PATCH] i386: Optimization for mm512_set1_pch.

2021-11-05 Thread Kong, Lingling via Gcc-patches
Hi, This patch is to support fold _mm512_fmadd_pch (a, _mm512_set1_pch(*(b)), c) to 1 instruction vfmaddcph (%rsp){1to16}, %zmm1, %zmm2. OK for master? gcc/ChangeLog: * config/i386/sse.md (fma___pair): Add new define_insn. (fma__fmaddc_bcst): Add new define_insn_and_spli

[PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-16 Thread Kong, Lingling via Gcc-patches
Hi, vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c. OK for master? gcc/ChangeLog: PR target/102811 * config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for f16c. (extendh

[PATCH] i386: add alias for f*mul_*ch intrinsics

2021-11-16 Thread Kong, Lingling via Gcc-patches
Hi, This patch is to add alias for f*mul_*ch intrinsics. Ok for master? gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_mul_pch): Add alias for _mm512_fmul_pch. (_mm512_mask_mul_pch): Likewise. (_mm512_maskz_mul_pch): Likewise. (_mm512_mul_round_pch): L

  1   2   >