[PATCH v2 01/18] Initial support for -mevex512

2023-10-06 Thread Haochen Jiang
Hi all, Sorry for the patch revision delay since just back from the vacation. I have slightly revised this patch for the __EVEX256__ request with the code: diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc index 47768fa0940..9c44bd7fb63 100644 --- a/gcc/config/i386/i386-c.cc +++

[PATCH 0/3] Add Intel new cpu archs

2023-10-15 Thread Haochen Jiang
Hi all, The patches aim to add new cpu archs Clear Water Forest and Panther Lake. Here comes the documentation: https://cdrdv2.intel.com/v1/dl/getContent/671368 Also in the patches, I refactored how we detect cpu according to features and added m_CORE_ATOM. Regtested on x86_64-pc-linux-gnu. Ok

[PATCH 2/3] x86: Add m_CORE_HYBRID for hybrid clients tuning

2023-10-15 Thread Haochen Jiang
gcc/Changelog: * config/i386/i386-options.cc (m_CORE_HYBRID): New. * config/i386/x86-tune.def: Replace hybrid client tune to m_CORE_HYBRID. --- gcc/config/i386/i386-options.cc | 1 + gcc/config/i386/x86-tune.def| 113 ++-- 2 files changed,

[PATCH 1/3] Initial Clear Water Forest Support

2023-10-15 Thread Haochen Jiang
gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_intel_cpu): Handle Clear Water Forest. * common/config/i386/i386-common.cc (processor_name): Add Clear Water Forest. (processor_alias_table): Ditto. * common/config/i386/i386-cpuinfo.h (enum processo

[PATCH 3/3] Initial Panther Lake Support

2023-10-15 Thread Haochen Jiang
gcc/ChangeLog: * common/config/i386/i386-common.cc (processor_name): Add Panther Lake. (processor_alias_table): Ditto. * common/config/i386/i386-cpuinfo.h (enum processor_types): Add INTEL_PANTHERLAKE. * config.gcc: Add -march=pantherlake. *

[PATCH] x86: Correct ISA enabled for clients since Arrow Lake

2023-10-18 Thread Haochen Jiang
Hi all, I just found that since ISAs enabled on Sierra Forest changed, clients since Arrow Lake will wrongly enable ENQCMD according to the current code. To avoid messing up again in the future, I changed the dependency on how ISAs are enabled currently by making clients depending on clients and

[PATCH v2] x86: Correct ISA enabled for clients since Arrow Lake

2023-10-18 Thread Haochen Jiang
Hi all, I slightly adjust the patch. No functional change has been done in v2 patch but just some formatting and order issue. Thx, Haochen gcc/ChangeLog: * config/i386/i386.h: Correct the ISA enabled for Arrow Lake. Also make Clearwater Forest depends on Sierra Forest. *

[PATCH] i386: Prevent splitting to xmm16+ when !TARGET_AVX512VL

2023-10-19 Thread Haochen Jiang
Hi all, Currently, there will be a chance in split to use x/ymm16+ w/o AVX512VL, which finally leads to an ICE as pr111753 does. This patch aims to fix that. Regtested on x86_64-pc-linux-gnu. Ok for trunk? Thx, Haochen gcc/ChangeLog: PR target/111753 * config/i386/i386.cc (ix8

[gccwwwdocs PATCH] gcc-13/14: Mention Intel new ISA and march support

2023-10-22 Thread Haochen Jiang
Hi all, This patch mentions recent update for x86-64 backend, including ISAs enabled update on previous introduced CPU and newly introduced options/ISAs/CPUs. Ok for wwwdocs? Thx, Haochen --- htdocs/gcc-13/changes.html | 8 htdocs/gcc-14/changes.html | 19 +++ 2 files

[PATCH] Fix incorrect option mask and avx512cd target push

2023-10-30 Thread Haochen Jiang
Hi all, This patch fixed two obvious bug in current evex512 implementation. Also, I moved AVX512CD+AVX512VL part out of the AVX512VL to avoid accidental handle miss in avx512cd in the future. Ok for trunk? BRs, Haochen gcc/ChangeLog: * config/i386/avx512cdintrin.h (target): Push evex5

[PATCH 0/4] Fix no-evex512 function attribute

2023-10-30 Thread Haochen Jiang
Hi all, These four patches are going to fix no-evex512 function attribute. The detail of the issue comes following: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111889 My proposal for this problem is to also push "no-evex512" when defining 128/256 intrins in AVX512. Besides, I added some new in

[PATCH 4/4] Push no-evex512 target for 128/256 bit intrins

2023-10-30 Thread Haochen Jiang
gcc/ChangeLog: PR target/111889 * config/i386/avx512bf16intrin.h: Push no-evex512 target. * config/i386/avx512bf16vlintrin.h: Ditto. * config/i386/avx512bitalgvlintrin.h: Ditto. * config/i386/avx512bwintrin.h: Ditto. * config/i386/avx512dqintrin.h: D

[PATCH 2/4] [PATCH 2/3] Change internal intrin call for AVX512 intrins

2023-10-30 Thread Haochen Jiang
gcc/ChangeLog: * config/i386/avx512bf16vlintrin.h: Change intrin call. * config/i386/avx512fintrin.h (_mm_avx512_undefined_ps): New. (_mm_avx512_undefined_pd): Ditto. (__attribute__): Change intrin call. * config/i386/avx512vbmivlintrin.h: Ditto.

[PATCH 3/4] [PATCH 3/3] Change internal intrin call for AVX512 intrins

2023-10-30 Thread Haochen Jiang
gcc/ChangeLog: * config/i386/avx512bf16vlintrin.h (_mm_avx512_castsi128_ps): New. (_mm256_avx512_castsi256_ps): Ditto. (_mm_avx512_slli_epi32): Ditto. (_mm256_avx512_slli_epi32): Ditto. (_mm_avx512_cvtepi16_epi32): Ditto. (_mm256_avx512_cvtep

[PATCH] Add -mevex512 into invoke.texi

2024-01-09 Thread Haochen Jiang
Hi Richard, It seems that I send out a not updated patch. This patch should what I want to send. Thx, Haochen gcc/ChangeLog: * doc/invoke.texi: Add -mevex512. --- gcc/doc/invoke.texi | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc

[PATCH] i386: Add AVX10.1 related macros

2024-01-09 Thread Haochen Jiang
Hi all, This patch aims to add AVX10.1 related macros for libgomp's request. The request comes following: https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642025.html Ok for trunk? Thx, Haochen gcc/ChangeLog: PR target/113288 * config/i386/i386-c.cc (ix86_target_macros_i

[PATCH] i386: Remove redundant move in vnni pattern

2024-01-11 Thread Haochen Jiang
Hi all, This patch removes all redundant set in vnni patterns. Ok for trunk? Thx, Haochen gcc/ChangeLog: * config/i386/sse.md (sdot_prod): Remove redundant SET. (usdot_prod): Ditto. (sdot_prod): Ditto. (udot_prod): Ditto. --- gcc/config/i386/sse.md | 4 1

[PATCH] i386: Modify testcases failed under -DDEBUG

2024-01-21 Thread Haochen Jiang
Hi all, Recently, I happened to run i386.exp under -DDEBUG and found some fail. This patch aims to fix that. Ok for trunk? Thx, Haochen gcc/testsuite/ChangeLog: * gcc.target/i386/adx-check.h: Include stdio.h when DEBUG is defined. * gcc.target/i386/avx512fp16-vscalefph-

[PATCH] i386: Remove RAO-INT from Grand Ridge

2023-12-13 Thread Haochen Jiang
Hi all, According to ISE050 published at the end of September, RAO-INT will not be in Grand Ridge anymore. This patch aims to remove it. The documentation comes following: https://cdrdv2.intel.com/v1/dl/getContent/671368 Regtested on x86_64-pc-linux-gnu. Ok for trunk and backport to GCC13? Thx

[PATCH] i386: Allow 64 bit mask register for -mno-evex512

2023-12-14 Thread Haochen Jiang
Hi all, There is a recent change in AVX10 documentation which allows 64 bit mask register instructions in AVX10-256, the documentation comes following: Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification https://cdrdv2.intel.com/v1/dl/getContent/784267 The Converged Vecto

[gcc-wwwdocs PATCH v2] gcc-13/14: Mention recent update for x86_64 backend

2023-12-21 Thread Haochen Jiang
Hi all, This is the v2 patch for the wwwdocs change regarding to review. If there is no objection, I will push this change next Tuesday. Changes is v2: - Remove RAO-INT from Grand Ridge - Remove the mask register restriction for -mno-evex512 - Arrange the options alphabetically - Other

[PATCH] i386: Fix recent testcase fail

2024-01-08 Thread Haochen Jiang
After commit 01f4251b8775c832a92d55e2df57c9ac72eaceef, early break vectorization is supported. The two testcases need to be fixed. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-xorsign-1.c: Fix testcase. * gcc.target/i386/part-vect-absneghf.c: Ditto. --- gcc/testsuite/gcc

[PATCH] Add -mevex512 into invoke.texi

2024-01-08 Thread Haochen Jiang
Hi all, In invoke.texi, -mevex512 is missing. This patch adds that. Ok for trunk? Thx, Haochen gcc/ChangeLog: * doc/invoke.texi: Add -mevex512. --- gcc/doc/invoke.texi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 6

[RFC] i386: Remove Xeon Phi ISA support

2023-11-30 Thread Haochen Jiang
Hi all, Since Knight Landing and Knight Mill microarchitectures were EOL in 2019 and previously ICC and ICX has removed the support and emitted errors, we would also like to remove the support in GCC to reduce maintainence effort. The deprecated Xeon Phi ISAs are AVX512PF, AVX512ER, AVX5124VNNIW,

[PATCH] i386: Mark Xeon Phi ISAs as deprecated

2023-11-30 Thread Haochen Jiang
Since Knight Landing and Knight Mill microarchitectures are EOL, we would like to remove its support in GCC 15. In GCC 14, we will first emit a warning for the usage. gcc/ChangeLog: * config/i386/driver-i386.cc (host_detect_local_cpu): Do not append "-mno-" for Xeon Phi ISAs.

[gcc-wwwdocs PATCH] gcc-13/14: Mention recent update for x86_64 backend

2023-12-07 Thread Haochen Jiang
Hi all, This patch will mention the following changes in wwwdocs for x86_64 backend: - AVX10.1 support - APX EGPR, PUSH2POP2, PPX and NDD support - Xeon Phi ISAs deprecated Also I adjust the words in x86_64 part for GCC 13. Ok for gcc-wwwdocs? Thx, Haochen Mention AVX10.1 support, APX su

[PATCH] i386: Fix PR110790 testcase

2023-12-12 Thread Haochen Jiang
Hi all, This patch will fix the testcase fail previously introduced. Approved by another thread: https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640288.html Pushed to trunk. Thx, Haochen gcc/testsuite/ChangeLog: * gcc.target/i386/pr110790-2.c: Change scan-assembler from shrq

[PATCH] i386: Fix isa attribute for TI/TF andnot mode

2023-11-06 Thread Haochen Jiang
Hi all, This patch aims fo fix the wrong isa attribute which caused regression on PR111907. Regtested on x86_64-pc-linux-gnu. Ok for trunk? Thx, Haochen gcc/ChangeLog: PR target/111907 * config/i386/i386.md (avx_noavx512vl): Add missing definition. * config/i386/sse.md

[RFC] Intel AVX10.1 Compiler Design and Support

2023-11-09 Thread Haochen Jiang
Hi all, This RFC patch aims to add AVX10.1 options. After we added -m[no-]evex512 support, it makes a lot easier to add them comparing to the August version. Detail for AVX10 is shown below: Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification It describes the Intel Advan

[PATCH] Initial support for AVX10.1

2023-11-09 Thread Haochen Jiang
gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Add avx10_set and version and detect avx10.1. (cpu_indicator_init): Handle avx10.1-512. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AVX10_1_256_SET): New. (OPTION_MASK_

[PATCH] i386: Fix AVX512 and AVX10 option issues

2023-11-22 Thread Haochen Jiang
Hi all, This patch should be able to fix the current issue mentioned in PR112643. Also, I fixed some legacy issues in code related to AVX512/AVX10. Ok for trunk? Thx, Haochen gcc/ChangeLog: PR target/112643 * config/i386/driver-i386.cc (check_avx10_avx512_features): Re

[PATCH] i386: Correct AVX10 CPUID emulation

2024-07-09 Thread Haochen Jiang
Hi all, AVX10 Documentaion has specified ecx value as 0 for AVX10 version and vector size under 0x24 subleaf. Although for ecx=1, the bits are all reserved for now, we still need to specify ecx as 0 to avoid dirty value in ecx. Bootstrapped on x86-64-pc-linux-gnu. Ok for trunk and backport to GCC

[PATCH] i386: Use BLKmode for {ld,st}tilecfg

2024-07-17 Thread Haochen Jiang
Hi all, For AMX instructions related with memory, we will treat the memory size as not specified since there won't be different size causing confusion for memory. This will change the output under Intel mode, which is broken for now when using with assembler and aligns to current binutils behavio

[PATCH] i386: Fix testcases generating invalid asm

2024-07-17 Thread Haochen Jiang
Hi all, For compile test, we should generate valid asm except for special purposes. Fix the compile test that generates invalid asm. Regtested on x86-64-pc-linux-gnu. Ok for trunk? Thx, Haochen gcc/testsuite/ChangeLog: * gcc.target/i386/apx-egprs-names.c: Use ax for short and a

[PATCH v2] i386: Fix testcases generating invalid asm

2024-07-17 Thread Haochen Jiang
Hi all, I revised the patch according to the comment. Ok for trunk? Thx, Haochen --- Changes in v2: Add suffix for mov to make the test more robust. --- For compile test, we should generate valid asm except for special purposes. Fix the compile test that generates invalid asm. gcc/testsuite

[PATCH 1/2] Adjust generic loop alignment from 16:11:8 to 16 for Intel processors

2024-05-14 Thread Haochen Jiang
Previously, we use 16:11:8 in generic tune for Intel processors, which lead to cross cache line issue and result in some random performance penalty in benchmarks with small loops commit to commit. After changing to always aligning to 16 bytes, it will somehow solve the issue. gcc/ChangeLog:

[PATCH 0/2] Align tight loops to solve cross cacheline issue

2024-05-14 Thread Haochen Jiang
n. We planned to backport it to GCC14.2. Thx, Haochen Haochen Jiang (1): Adjust generic loop alignment from 16:11:8 to 16 for Intel processors liuhongt (1): Align tight&hot loop without considering max skipping bytes. gcc/config/i386/i386.cc | 148 ++- g

[PATCH 2/2] Align tight&hot loop without considering max skipping bytes.

2024-05-14 Thread Haochen Jiang
From: liuhongt When hot loop is small enough to fix into one cacheline, we should align the loop with ceil_log2 (loop_size) without considering maximum skipp bytes. It will help code prefetch. gcc/ChangeLog: * config/i386/i386.cc (ix86_avoid_jump_mispredicts): Change gen_pad to

[PATCH] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

2024-05-20 Thread Haochen Jiang
Hi all, Since vpermq is really slow, we should avoid using it when it is the only instruction could be used for ix86_expand_vecop_qihi2. Bootstrapped and regtested on x86_64-pc-linux-gnu. Ok for trunk? Thx, Haochen gcc/ChangeLog: PR target/115069 * config/i386/i386-expand.cc (i

[PATCH v2] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

2024-05-21 Thread Haochen Jiang
Hi all, This is the v2 patch to fix PR115069. The new testcase has passed. Changes in v2: - Added a testcase. - Change the comment for the early exit. Thx, Haochen Since vpermq is really slow, we should avoid using it for permutation when vpmovwb is not available (needs AVX512BW) for ix86_e

[PATCH v3] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

2024-05-21 Thread Haochen Jiang
Hi all, This is the v3 patch to fix PR115069. The new testcase has passed. Changes in v3: - Simplify the testcase. Changes in v2: - Add a testcase. - Change the comment for the early exit. Thx, Haochen Since vpermq is really slow, we should avoid using it for permutation when vpmovwb is

[PATCH] Add AVX10.1 target_clones support

2024-05-28 Thread Haochen Jiang
Hi all, Since AVX10 is the first major ISA introduced after AVX-512, we propose to add target_clones support for it. Although AVX10.1-256 won't cover 512-bit part of AVX512F, but since it is only for priority but not for implication, it won't be an issue. Bootstrapped and regtested on x86_64-pc-

[PATCH 06/22] AVX10.2 ymm rounding: Support vcvtps2{, u}{dq, qq} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin):

[PATCH 12/22] AVX10.2 ymm rounding: Support vfmadd{132, 231, 213}p{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md: (_fmadd__mask3): Add condition check. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c:

[PATCH 02/22] AVX10.2 ymm rounding: Support vcvtdq2p{s, h} and vcvtpd2p{s, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: Add new intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_built

[PATCH 10/22] AVX10.2 ymm rounding: Support vcvt{, u}w2ph and vdivp{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin):

[PATCH 08/22] AVX10.2 ymm rounding: Support vcvttph2{, u}{dq, qq, w} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md (avx512fp16_fix_trunc2): Ex

[PATCH 05/22] AVX10.2 ymm rounding: Support vcvtph2{, u}w and vcvtps2p{d, hx} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin):

[PATCH 13/22] AVX10.2 ymm rounding: Support vfmaddcph and vfmaddsub{132, 231, 213}p{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md: (_fmaddsub__mask): Add cond

[PATCH 00/22] Support AVX10.2 ymm rounding

2024-08-14 Thread Haochen Jiang
Hi all, The initial patch for AVX10.2 has been merged this week. For the upcoming patches, we will first upstream ymm rounding control part. In ymm rounding part, ALL the instructions in AVX512 with 512-bit rounding control will also have 256-bit rounding control in AVX10.2. For clearness, the

[PATCH 03/22] AVX10.2 ymm rounding: Support vcvtpd2{, u}{dq, qq} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: Add new intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_built

[PATCH 07/22] AVX10.2 ymm rounding: Support vcvtqq2p{s, d, h} and vcvttpd2{, u}{dq, qq} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin):

[PATCH 09/22] AVX10.2 ymm rounding: Support vcvttps2{, u}{dq, qq} and vcvtu{dq, qq}2p{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md (unspec_fix_truncv8sfv8si2):

[PATCH 16/22] AVX10.2 ymm rounding: Support vfnmsub{132, 231, 213}p{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md: (_fnmsub__mask3): Add condition check. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c:

[PATCH 11/22] AVX10.2 ymm rounding: Support vfc{madd, mul}cph, vfixupimmp{s, d} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin):

[PATCH 15/22] AVX10.2 ymm rounding: Support vfmulcph and vfnmadd{132, 231, 213}p{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def (BDESC): Add new builtins. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * g

[PATCH 19/22] AVX10.2 ymm rounding: Support vmulp{s, d, h} and vrangep{s, d} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin):

[PATCH 04/22] AVX10.2 ymm rounding: Support vcvtph2p{s, d, sx} and vcvtph2{, u}{dq, qq} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin):

[PATCH 18/22] AVX10.2 ymm rounding: Support v{max, min}p{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def (BDESC): Add new builtins. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * g

[PATCH 01/22] AVX10.2 ymm rounding: Support vadd{s, d, h} and vcmp{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config.gcc: Add avx10_2roundingintrin.h. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle

[PATCH 14/22] AVX10.2 ymm rounding: Support vfm{sub, subadd}{132, 231, 213}p{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md: (_fmsub__mask): Add conditi

[PATCH 21/22] AVX10.2 ymm rounding: Support vscalefp{s,d,h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/sse.md: (_scalef): Add condition check. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c:

[PATCH 20/22] AVX10.2 ymm rounding: Support vreducep{s, d, h} and vrndscalep{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md: (reducep): Add condition check. (_rndscale): Ditto. gcc/testsuite/ChangeLog:

[PATCH 17/22] AVX10.2 ymm rounding: Support vgetexpp{s, d, h} and vgetmantp{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin):

[PATCH 22/22] AVX10.2 ymm rounding: Support vsqrtp{s, d, h} and vsubp{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def (BDESC): Add new builtins. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * g

[PATCH 00/12] AVX10.2: Support new instructions

2024-08-19 Thread Haochen Jiang
Hi all, The AVX10.2 ymm rounding patches has been merged to trunk around 6 hours ago. As mentioned before, next step will be AVX10.2 new instruction support. This patch series could be divided into three part. The first patch will refactor m512-check.h under testsuite to reuse AVX-512 helper fun

[PATCH 01/12] i386: Refactor m512-check.h

2024-08-19 Thread Haochen Jiang
After AVX10 introduction, we still want to use AVX512 helper functions to avoid duplicate code. In order to reuse them, we need to do some refactor to make sure each function define happen under correct ISA to avoid ABI warnings. gcc/testsuite/ChangeLog: * gcc.target/i386/m512-check.h: Wr

[PATCH 03/12] [PATCH 2/2] AVX10.2: Support media instructions

2024-08-19 Thread Haochen Jiang
gcc/ChangeLog: * config/i386/avx10_2-512mediaintrin.h: Add new intrins. * config/i386/avx10_2mediaintrin.h: Ditto. * config/i386/i386-builtin.def: Add new builtins. * config/i386/i386-builtins.cc (def_builtin): Handle shared builtins between AVXVNNIINT16 and

[PATCH 02/12] [PATCH 1/2] AVX10.2: Support media instructions

2024-08-19 Thread Haochen Jiang
: Ditto. * gcc.target/i386/avx10_2-vpdpbuud-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpbuuds-2.c: Ditto. Co-authored-by: Haochen Jiang --- gcc/config.gcc| 3 +- gcc/config/i386/avx10_2-512mediaintrin.h | 234 +++ gcc/config/i386

[PATCH 08/12] [PATCH 2/2] AVX10.2: Support saturating convert instructions

2024-08-19 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md (avx10_2_vcvttpd2dqs): New. (avx10_2_vcvttpd2qqs): Ditto. (avx10_2_vcvttps2dqs): Ditto. (avx10_2_vcvttps2qqs):

[PATCH 07/12] [PATCH 1/2] AVX10.2: Support saturating convert instructions

2024-08-19 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config.gcc: Add avx10_2satcvtintrin.h and avx10_2-512satcvtintrin.h. * config/i386/i386-builtin-types.def: Add DEF_FUNCTION_TYPE (V8HI, V8BF, V8HI, UQI), (V16HI, V16BF, V16HI, UHI), (V32HI, V32BF, V32HI, USI), (V16

[PATCH 05/12] [PATCH 1/2] AVX10.2: Support BF16 instructions

2024-08-19 Thread Haochen Jiang
From: konglin1 gcc/ChangeLog: * config.gcc: Add avx10_2-512bf16intrin.h and avx10_2bf16intrin.h. * config/i386/i386-builtin-types.def : Add new DEF_FUNCTION_TYPE for V32BF_FTYPE_V32BF_V32BF, V16BF_FTYPE_V16BF_V16BF, V8BF_FTYPE_V8BF_V8BF, V8BF_FTYPE_V8BF_V8

[PATCH 06/12] [PATCH 2/2] AVX10.2: Support BF16 instructions

2024-08-19 Thread Haochen Jiang
From: konglin1 gcc/ChangeLog: * config/i386/avx10_2-512bf16intrin.h: Add new intrinsics. * config/i386/avx10_2bf16intrin.h: Diito. * config/i386/i386-builtin-types.def : Add new DEF_FUNCTION_TYPE for new type. * config/i386/i386-builtin.def (BDESC): Add ne

[PATCH 09/12] AVX10.2: Support minmax instructions

2024-08-19 Thread Haochen Jiang
gcc.target/i386/avx10_2-vminmaxpd-2.c: Ditto. * gcc.target/i386/avx10_2-vminmaxph-2.c: Ditto. * gcc.target/i386/avx10_2-vminmaxps-2.c: Ditto. Co-authored-by: Lin Hu Co-authored-by: Haochen Jiang --- gcc/config.gcc|3 +- gcc/config/i3

[PATCH 10/12] AVX10.2: Support vector copy instructions

2024-08-19 Thread Haochen Jiang
From: "Zhang, Jun" gcc/ChangeLog: * config/config.gcc: Add avx10_2copyintrin.h. * config/i386/i386.md (avx10_2): New isa attribute. * config/i386/immintrin.h: Include avx10_2copyintrin.h. * config/i386/sse.md (sse_movss_): Add new constraints to handle AVX

[PATCH 12/12] i386: Add bf8 -> fp16 intrin

2024-08-19 Thread Haochen Jiang
Since BF8 and FP16 have same bits for exponent, the type conversion between them is just a cast for fraction part. We will use a sequence of instrctions instead of new instructions to do that. For convenience, intrins are also provided. gcc/ChangeLog: * config/i386/avx10_2-512convertintri

[PATCH 11/12] AVX10.2: Support compare instructions

2024-08-19 Thread Haochen Jiang
in): Ditto. (ix86_expand_builtin): Change function call. * config/i386/i386.md (UNSPEC_COMX): New unspec. * config/i386/sse.md (avx10_2_vcomx): New. (_comi): Add HFmode. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-compare-1.c: New test. Co-authored-by: Hao

[PATCH 1/8] i386: Auto vectorize sdot_prod, usdot_prod, udot_prod with AVX10.2 instructions

2024-08-25 Thread Haochen Jiang
gcc/ChangeLog: * config/i386/sse.md (VI1_AVX512VNNIBW): New. (VI2_AVX10_2): Ditto. (sdot_prod): Add AVX10.2 to auto vectorize and combine 512 bit part. (udot_prod): Ditto. (sdot_prodv64qi): Removed. (udot_prodv64qi): Ditto. (usdot_pro

[PATCH 2/8] i386: Optimize ordered and nonequal

2024-08-25 Thread Haochen Jiang
From: "Hu, Lin1" Currently, when we input !__builtin_isunordered (a, b) && (a != b), gcc will emit ucomiss %xmm1, %xmm0 movl $1, %ecx setp %dl setnp %al cmovne %ecx, %edx andl %edx, %eax movzbl %al, %eax In fact, xorl %eax, %eax ucomiss %xmm1, %xmm0 setne %al is better. gcc/

[PATCH 0/8] i386: Opmitize code with AVX10.2 new instructions

2024-08-25 Thread Haochen Jiang
Hi all, I have just commited AVX10.2 new instructions patches into trunk hours ago. The next and final part for AVX10.2 upstream is to optimize code with AVX10.2 new instructions. In this patch series, it will contain the following optimizations: - VNNI instruction auto vectorize (PATCH 1).

[PATCH 5/8] i386: Support vectorized BF16 FMA with AVX10.2 instructions

2024-08-25 Thread Haochen Jiang
From: Levy Hsu gcc/ChangeLog: * config/i386/sse.md: Add V8BF/V16BF/V32BF to mode iterator FMAMODEM. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-bf-vector-fma-1.c: New test. * gcc.target/i386/avx10_2-bf-vector-fma-1.c: New test. --- gcc/config/i386/sse.md

[PATCH 7/8] i386: Support vectorized BF16 sqrt with AVX10.2 instruction

2024-08-25 Thread Haochen Jiang
From: Levy Hsu gcc/ChangeLog: * config/i386/sse.md: Expand VF2H to VF2HB with VBF modes. --- gcc/config/i386/sse.md | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index b374783429c..2de592a9c8f 100644 ---

[PATCH 3/8] i386: Optimize generate insn for avx10.2 compare

2024-08-25 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_fp_compare): Add UNSPEC to support the optimization. * config/i386/i386.cc (ix86_fp_compare_code_to_integer): Add NE/EQ. * config/i386/i386.md (*cmpx): New define_insn. (*cmpxhf): Di

[PATCH 6/8] i386: Support vectorized BF16 smaxmin with AVX10.2 instructions

2024-08-25 Thread Haochen Jiang
From: Levy Hsu gcc/ChangeLog: * config/i386/sse.md (3): New define expand pattern for BF smaxmin. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-bf-vector-smaxmin-1.c: New test. * gcc.target/i386/avx10_2-bf-vector-smaxmin-1.c: New test. --- gcc/config/i3

[PATCH 8/8] i386: Support vec_cmp for V8BF/V16BF/V32BF in AVX10.2

2024-08-25 Thread Haochen Jiang
From: Levy Hsu gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_use_mask_cmp_p): Add BFmode for int mask cmp. * config/i386/sse.md (vec_cmp): New vec_cmp expand for VBF modes. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-bf-vector-cmpp-1.c:

[PATCH 4/8] i386: Support vectorized BF16 add/sub/mul/div with AVX10.2 instructions

2024-08-25 Thread Haochen Jiang
From: Levy Hsu AVX10.2 introduces several non-exception instructions for BF16 vector. Enable vectorized BF add/sub/mul/div operation by supporting standard optab for them. gcc/ChangeLog: * config/i386/sse.md (div3): New expander for BFmode div. (VF_BHSD): New mode iterator with

[gcc-wwwdocs PATCH] gcc-15: Mention recent update for x86_64 backend

2024-08-27 Thread Haochen Jiang
Hi all, Sorry for the disturb since I mis-typoed gcc-patches to gcc-patchs, resend the patch. This patch will add documentation for recent update in x86-64 backend. Ok for wwwdocs trunk? Thx, Haochen --- Mention AVX10.2 support and Xeon Phi removal in GCC 15. --- htdocs/gcc-15/changes.html

[PATCH] i386: Change prefetchi output template

2024-07-21 Thread Haochen Jiang
Hi all, For prefetchi instructions, RIP-relative address is explicitly mentioned for operand and assembler obeys that rule strictly. This makes instruction like: prefetchit0 bar got illegal for assembler, which should be a broad usage for prefetchi. Explicitly add (%rip) after funct

[PATCH v2] i386: Change prefetchi output template

2024-07-22 Thread Haochen Jiang
Hi all, I tested with %a and it works. Therefore I suppose it is a better solution. Bootstrapped and regtested on x86-64-pc-linux-gnu. Ok for trunk and backport to GCC 13 and 14? Thx, Haochen --- Changes in v2: Use %a in pattern --- For prefetchi instructions, RIP-relative address is explici

[PATCH] i386: Fix AVX512 intrin macro typo

2024-07-25 Thread Haochen Jiang
Hi all, There are several typo in AVX512 intrins macro define. They will eventually result in errors with -O0. This patch will fix that. Bootstrapped on x86-64-pc-linux-gnu. Ok for trunk and backport to GCC14, GCC 13 and GCC 12? Thx, Haochen gcc/ChangeLog: * config/i386/avx512dqintrin.

[PATCH] i386: Add non-optimize prefetchi intrins

2024-07-25 Thread Haochen Jiang
Hi all, Under -O0, with the "newly" introduced intrins, the variable will be transformed as mem instead of the origin symbol_ref. The compiler will then treat the operand as invalid and turn the operation into nop, which is not expected. Use macro for non-optimize to keep the variable as symbol_re

[PATCH v2] i386: Fix AVX512 intrin macro typo

2024-07-26 Thread Haochen Jiang
Hi all, I have added related testcases into the patch. Ok for trunk and backport to GCC 14, GCC 13 and GCC 12? Thx, Haochen --- Changes in v2: Add related testcases --- There are several typo in AVX512 intrins macro define. Correct them to solve errors when compiled with -O0. gcc/ChangeLog

[PATCH v2] i386: Add non-optimize prefetchi intrins

2024-07-26 Thread Haochen Jiang
Hi all, I added related O0 testcase in this patch. Ok for trunk and backport to GCC 14 and GCC 13? Thx, Haochen --- Changes in v2: Add testcases. --- Under -O0, with the "newly" introduced intrins, the variable will be transformed as mem instead of the origin symbol_ref. The compiler will th

[GCC12/13 PATCH] i386: Use _mm_setzero_ps/d instead of _mm_avx512_setzero_ps/d for GCC13/12

2024-07-28 Thread Haochen Jiang
Hi all, In GCC13/12, there is no _mm_avx512_setzero_ps/d since it is introduced in GCC14. Fix the backport issue as obvious in: https://gcc.gnu.org/pipermail/gcc-regression/2024-July/080385.html Thx, Haochen gcc/ChangeLog: * config/i386/avx512dqintrin.h (_mm_reduce_round_sd): Use

[PATCH 0/1] Initial support for AVX10.2

2024-08-01 Thread Haochen Jiang
Hi all, AVX10.2 tech details has been just published on July 31st in the following link: https://cdrdv2.intel.com/v1/dl/getContent/828965 For new features and instructions, we could divide them into two parts. One is ymm rounding control, the other is the new instructions. In the following week

[PATCH 1/1] Initial support for AVX10.2

2024-08-01 Thread Haochen Jiang
gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Handle avx10.2. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AVX10_2_256_SET): New. (OPTION_MASK_ISA2_AVX10_2_512_SET): Ditto. (OPTION_MASK_ISA2_AVX10_1_256_UNSET):

[gcc-wwwdocs PATCH] Uncomment MCore part title

2024-04-12 Thread Haochen Jiang
Hi all, When I am checking GCC14 documentation, I found that MCore forgot to uncomment the title for their part, which caused the documentation is mixed with x86. Uncomment that and commit as obvious. Thx, Haochen --- htdocs/gcc-14/changes.html | 2 +- 1 file changed, 1 insertion(+), 1 deletio

[PATCH] i386: Fix Sierra Forest auto dispatch

2024-04-22 Thread Haochen Jiang
Hi all, This patch fixes an bug in mapping which caused auto dispatch failed. Sierra Forest is in processor_types enum, but not processor_subtypes. Committed as obvious and backport to GCC13. Thx, Haochen gcc/ChangeLog: * common/config/i386/i386-common.cc (processor_alias_table):

[PATCH] i386: Fix behavior for both using AVX10.1-256 in options and function attribute

2024-04-23 Thread Haochen Jiang
Hi all, When we are using -mavx10.1-256 in command line and avx10.1-256 in target attribute together, zmm should never be generated. But current GCC will generate zmm since it wrongly enables EVEX512 for non-explicitly set AVX512. This patch will fix that issue. Regtested on x86_64-pc-linux-gnu.

[PATCH] i386: Fix array index overflow in pr105354-2.c

2024-04-26 Thread Haochen Jiang
Hi all, The array index should not be over 8 for v8hi, or it will fail under -O0 or using -fstack-protector. This patch aims to fix that, which is mentioned in PR110621. Commit as obvious and backport to GCC13. Thx, Haochen gcc/testsuite/ChangeLog: PR target/110621 * gcc.targe

  1   2   3   >