Hi all,
Sorry for the patch revision delay since just back from the vacation.
I have slightly revised this patch for the __EVEX256__ request with the code:
diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc
index 47768fa0940..9c44bd7fb63 100644
--- a/gcc/config/i386/i386-c.cc
+++
Hi all,
The patches aim to add new cpu archs Clear Water Forest and
Panther Lake. Here comes the documentation:
https://cdrdv2.intel.com/v1/dl/getContent/671368
Also in the patches, I refactored how we detect cpu according to features
and added m_CORE_ATOM.
Regtested on x86_64-pc-linux-gnu. Ok
gcc/Changelog:
* config/i386/i386-options.cc (m_CORE_HYBRID): New.
* config/i386/x86-tune.def: Replace hybrid client tune to
m_CORE_HYBRID.
---
gcc/config/i386/i386-options.cc | 1 +
gcc/config/i386/x86-tune.def| 113 ++--
2 files changed,
gcc/ChangeLog:
* common/config/i386/cpuinfo.h
(get_intel_cpu): Handle Clear Water Forest.
* common/config/i386/i386-common.cc (processor_name):
Add Clear Water Forest.
(processor_alias_table): Ditto.
* common/config/i386/i386-cpuinfo.h (enum processo
gcc/ChangeLog:
* common/config/i386/i386-common.cc (processor_name):
Add Panther Lake.
(processor_alias_table): Ditto.
* common/config/i386/i386-cpuinfo.h (enum processor_types):
Add INTEL_PANTHERLAKE.
* config.gcc: Add -march=pantherlake.
*
Hi all,
I just found that since ISAs enabled on Sierra Forest changed, clients since
Arrow Lake will wrongly enable ENQCMD according to the current code.
To avoid messing up again in the future, I changed the dependency on how ISAs
are enabled currently by making clients depending on clients and
Hi all,
I slightly adjust the patch. No functional change has been done in v2
patch but just some formatting and order issue.
Thx,
Haochen
gcc/ChangeLog:
* config/i386/i386.h: Correct the ISA enabled for Arrow Lake.
Also make Clearwater Forest depends on Sierra Forest.
*
Hi all,
Currently, there will be a chance in split to use x/ymm16+ w/o AVX512VL,
which finally leads to an ICE as pr111753 does.
This patch aims to fix that.
Regtested on x86_64-pc-linux-gnu. Ok for trunk?
Thx,
Haochen
gcc/ChangeLog:
PR target/111753
* config/i386/i386.cc (ix8
Hi all,
This patch mentions recent update for x86-64 backend, including ISAs enabled
update on previous introduced CPU and newly introduced options/ISAs/CPUs.
Ok for wwwdocs?
Thx,
Haochen
---
htdocs/gcc-13/changes.html | 8
htdocs/gcc-14/changes.html | 19 +++
2 files
Hi all,
This patch fixed two obvious bug in current evex512 implementation.
Also, I moved AVX512CD+AVX512VL part out of the AVX512VL to avoid
accidental handle miss in avx512cd in the future.
Ok for trunk?
BRs,
Haochen
gcc/ChangeLog:
* config/i386/avx512cdintrin.h (target): Push evex5
Hi all,
These four patches are going to fix no-evex512 function attribute. The detail
of the issue comes following:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111889
My proposal for this problem is to also push "no-evex512" when defining
128/256 intrins in AVX512.
Besides, I added some new in
gcc/ChangeLog:
PR target/111889
* config/i386/avx512bf16intrin.h: Push no-evex512 target.
* config/i386/avx512bf16vlintrin.h: Ditto.
* config/i386/avx512bitalgvlintrin.h: Ditto.
* config/i386/avx512bwintrin.h: Ditto.
* config/i386/avx512dqintrin.h: D
gcc/ChangeLog:
* config/i386/avx512bf16vlintrin.h: Change intrin call.
* config/i386/avx512fintrin.h
(_mm_avx512_undefined_ps): New.
(_mm_avx512_undefined_pd): Ditto.
(__attribute__): Change intrin call.
* config/i386/avx512vbmivlintrin.h: Ditto.
gcc/ChangeLog:
* config/i386/avx512bf16vlintrin.h
(_mm_avx512_castsi128_ps): New.
(_mm256_avx512_castsi256_ps): Ditto.
(_mm_avx512_slli_epi32): Ditto.
(_mm256_avx512_slli_epi32): Ditto.
(_mm_avx512_cvtepi16_epi32): Ditto.
(_mm256_avx512_cvtep
Hi Richard,
It seems that I send out a not updated patch. This patch should what
I want to send.
Thx,
Haochen
gcc/ChangeLog:
* doc/invoke.texi: Add -mevex512.
---
gcc/doc/invoke.texi | 7 ++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/gcc/doc/invoke.texi b/gcc/doc
Hi all,
This patch aims to add AVX10.1 related macros for libgomp's request. The
request comes following:
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642025.html
Ok for trunk?
Thx,
Haochen
gcc/ChangeLog:
PR target/113288
* config/i386/i386-c.cc (ix86_target_macros_i
Hi all,
This patch removes all redundant set in vnni patterns.
Ok for trunk?
Thx,
Haochen
gcc/ChangeLog:
* config/i386/sse.md (sdot_prod): Remove redundant SET.
(usdot_prod): Ditto.
(sdot_prod): Ditto.
(udot_prod): Ditto.
---
gcc/config/i386/sse.md | 4
1
Hi all,
Recently, I happened to run i386.exp under -DDEBUG and found some fail.
This patch aims to fix that. Ok for trunk?
Thx,
Haochen
gcc/testsuite/ChangeLog:
* gcc.target/i386/adx-check.h: Include stdio.h when DEBUG
is defined.
* gcc.target/i386/avx512fp16-vscalefph-
Hi all,
According to ISE050 published at the end of September, RAO-INT will not
be in Grand Ridge anymore. This patch aims to remove it.
The documentation comes following:
https://cdrdv2.intel.com/v1/dl/getContent/671368
Regtested on x86_64-pc-linux-gnu. Ok for trunk and backport to GCC13?
Thx
Hi all,
There is a recent change in AVX10 documentation which allows 64 bit mask
register instructions in AVX10-256, the documentation comes following:
Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification
https://cdrdv2.intel.com/v1/dl/getContent/784267
The Converged Vecto
Hi all,
This is the v2 patch for the wwwdocs change regarding to review.
If there is no objection, I will push this change next Tuesday.
Changes is v2:
- Remove RAO-INT from Grand Ridge
- Remove the mask register restriction for -mno-evex512
- Arrange the options alphabetically
- Other
After commit 01f4251b8775c832a92d55e2df57c9ac72eaceef, early break
vectorization is supported. The two testcases need to be fixed.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512fp16-xorsign-1.c: Fix testcase.
* gcc.target/i386/part-vect-absneghf.c: Ditto.
---
gcc/testsuite/gcc
Hi all,
In invoke.texi, -mevex512 is missing. This patch adds that.
Ok for trunk?
Thx,
Haochen
gcc/ChangeLog:
* doc/invoke.texi: Add -mevex512.
---
gcc/doc/invoke.texi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6
Hi all,
Since Knight Landing and Knight Mill microarchitectures were EOL in 2019
and previously ICC and ICX has removed the support and emitted errors, we
would also like to remove the support in GCC to reduce maintainence effort.
The deprecated Xeon Phi ISAs are AVX512PF, AVX512ER, AVX5124VNNIW,
Since Knight Landing and Knight Mill microarchitectures are EOL, we
would like to remove its support in GCC 15. In GCC 14, we will first
emit a warning for the usage.
gcc/ChangeLog:
* config/i386/driver-i386.cc (host_detect_local_cpu):
Do not append "-mno-" for Xeon Phi ISAs.
Hi all,
This patch will mention the following changes in wwwdocs for x86_64 backend:
- AVX10.1 support
- APX EGPR, PUSH2POP2, PPX and NDD support
- Xeon Phi ISAs deprecated
Also I adjust the words in x86_64 part for GCC 13. Ok for gcc-wwwdocs?
Thx,
Haochen
Mention AVX10.1 support, APX su
Hi all,
This patch will fix the testcase fail previously introduced.
Approved by another thread:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640288.html
Pushed to trunk.
Thx,
Haochen
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr110790-2.c: Change scan-assembler from shrq
Hi all,
This patch aims fo fix the wrong isa attribute which caused regression
on PR111907.
Regtested on x86_64-pc-linux-gnu. Ok for trunk?
Thx,
Haochen
gcc/ChangeLog:
PR target/111907
* config/i386/i386.md (avx_noavx512vl): Add missing definition.
* config/i386/sse.md
Hi all,
This RFC patch aims to add AVX10.1 options. After we added -m[no-]evex512
support, it makes a lot easier to add them comparing to the August version.
Detail for AVX10 is shown below:
Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification
It describes the Intel Advan
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (get_available_features):
Add avx10_set and version and detect avx10.1.
(cpu_indicator_init): Handle avx10.1-512.
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AVX10_1_256_SET): New.
(OPTION_MASK_
Hi all,
This patch should be able to fix the current issue mentioned in PR112643.
Also, I fixed some legacy issues in code related to AVX512/AVX10.
Ok for trunk?
Thx,
Haochen
gcc/ChangeLog:
PR target/112643
* config/i386/driver-i386.cc (check_avx10_avx512_features):
Re
Hi all,
AVX10 Documentaion has specified ecx value as 0 for AVX10 version and
vector size under 0x24 subleaf. Although for ecx=1, the bits are all
reserved for now, we still need to specify ecx as 0 to avoid dirty
value in ecx.
Bootstrapped on x86-64-pc-linux-gnu. Ok for trunk and backport to GCC
Hi all,
For AMX instructions related with memory, we will treat the memory
size as not specified since there won't be different size causing
confusion for memory.
This will change the output under Intel mode, which is broken for now when
using with assembler and aligns to current binutils behavio
Hi all,
For compile test, we should generate valid asm except for special purposes.
Fix the compile test that generates invalid asm.
Regtested on x86-64-pc-linux-gnu. Ok for trunk?
Thx,
Haochen
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-egprs-names.c: Use ax for short and
a
Hi all,
I revised the patch according to the comment.
Ok for trunk?
Thx,
Haochen
---
Changes in v2: Add suffix for mov to make the test more robust.
---
For compile test, we should generate valid asm except for special purposes.
Fix the compile test that generates invalid asm.
gcc/testsuite
Previously, we use 16:11:8 in generic tune for Intel processors, which
lead to cross cache line issue and result in some random performance
penalty in benchmarks with small loops commit to commit.
After changing to always aligning to 16 bytes, it will somehow solve
the issue.
gcc/ChangeLog:
n. We planned to backport it to GCC14.2.
Thx,
Haochen
Haochen Jiang (1):
Adjust generic loop alignment from 16:11:8 to 16 for Intel processors
liuhongt (1):
Align tight&hot loop without considering max skipping bytes.
gcc/config/i386/i386.cc | 148 ++-
g
From: liuhongt
When hot loop is small enough to fix into one cacheline, we should align
the loop with ceil_log2 (loop_size) without considering maximum
skipp bytes. It will help code prefetch.
gcc/ChangeLog:
* config/i386/i386.cc (ix86_avoid_jump_mispredicts): Change
gen_pad to
Hi all,
Since vpermq is really slow, we should avoid using it when it is
the only instruction could be used for ix86_expand_vecop_qihi2.
Bootstrapped and regtested on x86_64-pc-linux-gnu. Ok for trunk?
Thx,
Haochen
gcc/ChangeLog:
PR target/115069
* config/i386/i386-expand.cc (i
Hi all,
This is the v2 patch to fix PR115069. The new testcase has passed.
Changes in v2:
- Added a testcase.
- Change the comment for the early exit.
Thx,
Haochen
Since vpermq is really slow, we should avoid using it for permutation
when vpmovwb is not available (needs AVX512BW) for ix86_e
Hi all,
This is the v3 patch to fix PR115069. The new testcase has passed.
Changes in v3:
- Simplify the testcase.
Changes in v2:
- Add a testcase.
- Change the comment for the early exit.
Thx,
Haochen
Since vpermq is really slow, we should avoid using it for permutation
when vpmovwb is
Hi all,
Since AVX10 is the first major ISA introduced after AVX-512, we propose
to add target_clones support for it.
Although AVX10.1-256 won't cover 512-bit part of AVX512F, but since
it is only for priority but not for implication, it won't be an issue.
Bootstrapped and regtested on x86_64-pc-
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin):
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md:
(_fmadd__mask3): Add condition check.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c:
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: Add new intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_built
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin):
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md
(avx512fp16_fix_trunc2):
Ex
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin):
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md:
(_fmaddsub__mask): Add cond
Hi all,
The initial patch for AVX10.2 has been merged this week.
For the upcoming patches, we will first upstream ymm rounding control part.
In ymm rounding part, ALL the instructions in AVX512 with 512-bit rounding
control will also have 256-bit rounding control in AVX10.2.
For clearness, the
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: Add new intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_built
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin):
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md
(unspec_fix_truncv8sfv8si2):
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md:
(_fnmsub__mask3): Add condition check.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c:
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin):
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* g
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin):
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin):
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* g
From: "Hu, Lin1"
gcc/ChangeLog:
* config.gcc: Add avx10_2roundingintrin.h.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md:
(_fmsub__mask): Add conditi
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/sse.md:
(_scalef): Add condition check.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c:
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md:
(reducep):
Add condition check.
(_rndscale): Ditto.
gcc/testsuite/ChangeLog:
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin):
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* g
Hi all,
The AVX10.2 ymm rounding patches has been merged to trunk around
6 hours ago. As mentioned before, next step will be AVX10.2 new
instruction support.
This patch series could be divided into three part.
The first patch will refactor m512-check.h under testsuite to reuse
AVX-512 helper fun
After AVX10 introduction, we still want to use AVX512 helper functions
to avoid duplicate code. In order to reuse them, we need to do some refactor
to make sure each function define happen under correct ISA to avoid ABI
warnings.
gcc/testsuite/ChangeLog:
* gcc.target/i386/m512-check.h: Wr
gcc/ChangeLog:
* config/i386/avx10_2-512mediaintrin.h: Add new intrins.
* config/i386/avx10_2mediaintrin.h: Ditto.
* config/i386/i386-builtin.def: Add new builtins.
* config/i386/i386-builtins.cc (def_builtin): Handle shared
builtins between AVXVNNIINT16 and
: Ditto.
* gcc.target/i386/avx10_2-vpdpbuud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbuuds-2.c: Ditto.
Co-authored-by: Haochen Jiang
---
gcc/config.gcc| 3 +-
gcc/config/i386/avx10_2-512mediaintrin.h | 234 +++
gcc/config/i386
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md
(avx10_2_vcvttpd2dqs):
New.
(avx10_2_vcvttpd2qqs):
Ditto.
(avx10_2_vcvttps2dqs):
Ditto.
(avx10_2_vcvttps2qqs):
From: "Hu, Lin1"
gcc/ChangeLog:
* config.gcc: Add avx10_2satcvtintrin.h and
avx10_2-512satcvtintrin.h.
* config/i386/i386-builtin-types.def:
Add DEF_FUNCTION_TYPE (V8HI, V8BF, V8HI, UQI),
(V16HI, V16BF, V16HI, UHI), (V32HI, V32BF, V32HI, USI),
(V16
From: konglin1
gcc/ChangeLog:
* config.gcc: Add avx10_2-512bf16intrin.h and avx10_2bf16intrin.h.
* config/i386/i386-builtin-types.def : Add new
DEF_FUNCTION_TYPE for V32BF_FTYPE_V32BF_V32BF,
V16BF_FTYPE_V16BF_V16BF, V8BF_FTYPE_V8BF_V8BF,
V8BF_FTYPE_V8BF_V8
From: konglin1
gcc/ChangeLog:
* config/i386/avx10_2-512bf16intrin.h: Add new intrinsics.
* config/i386/avx10_2bf16intrin.h: Diito.
* config/i386/i386-builtin-types.def : Add new DEF_FUNCTION_TYPE
for new type.
* config/i386/i386-builtin.def (BDESC): Add ne
gcc.target/i386/avx10_2-vminmaxpd-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxph-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxps-2.c: Ditto.
Co-authored-by: Lin Hu
Co-authored-by: Haochen Jiang
---
gcc/config.gcc|3 +-
gcc/config/i3
From: "Zhang, Jun"
gcc/ChangeLog:
* config/config.gcc: Add avx10_2copyintrin.h.
* config/i386/i386.md (avx10_2): New isa attribute.
* config/i386/immintrin.h: Include avx10_2copyintrin.h.
* config/i386/sse.md
(sse_movss_): Add new constraints to handle AVX
Since BF8 and FP16 have same bits for exponent, the type conversion
between them is just a cast for fraction part. We will use a sequence
of instrctions instead of new instructions to do that. For convenience,
intrins are also provided.
gcc/ChangeLog:
* config/i386/avx10_2-512convertintri
in): Ditto.
(ix86_expand_builtin): Change function call.
* config/i386/i386.md (UNSPEC_COMX): New unspec.
* config/i386/sse.md
(avx10_2_vcomx): New.
(_comi): Add HFmode.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx10_2-compare-1.c: New test.
Co-authored-by: Hao
gcc/ChangeLog:
* config/i386/sse.md (VI1_AVX512VNNIBW): New.
(VI2_AVX10_2): Ditto.
(sdot_prod): Add AVX10.2
to auto vectorize and combine 512 bit part.
(udot_prod): Ditto.
(sdot_prodv64qi): Removed.
(udot_prodv64qi): Ditto.
(usdot_pro
From: "Hu, Lin1"
Currently, when we input !__builtin_isunordered (a, b) && (a != b), gcc
will emit
ucomiss %xmm1, %xmm0
movl $1, %ecx
setp %dl
setnp %al
cmovne %ecx, %edx
andl %edx, %eax
movzbl %al, %eax
In fact,
xorl %eax, %eax
ucomiss %xmm1, %xmm0
setne %al
is better.
gcc/
Hi all,
I have just commited AVX10.2 new instructions patches into trunk hours
ago. The next and final part for AVX10.2 upstream is to optimize code
with AVX10.2 new instructions.
In this patch series, it will contain the following optimizations:
- VNNI instruction auto vectorize (PATCH 1).
From: Levy Hsu
gcc/ChangeLog:
* config/i386/sse.md: Add V8BF/V16BF/V32BF to mode iterator FMAMODEM.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx10_2-512-bf-vector-fma-1.c: New test.
* gcc.target/i386/avx10_2-bf-vector-fma-1.c: New test.
---
gcc/config/i386/sse.md
From: Levy Hsu
gcc/ChangeLog:
* config/i386/sse.md: Expand VF2H to VF2HB with VBF modes.
---
gcc/config/i386/sse.md | 13 -
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index b374783429c..2de592a9c8f 100644
---
From: "Hu, Lin1"
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_fp_compare): Add UNSPEC to
support the optimization.
* config/i386/i386.cc (ix86_fp_compare_code_to_integer): Add NE/EQ.
* config/i386/i386.md (*cmpx): New define_insn.
(*cmpxhf): Di
From: Levy Hsu
gcc/ChangeLog:
* config/i386/sse.md
(3): New define expand pattern for BF smaxmin.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx10_2-512-bf-vector-smaxmin-1.c: New test.
* gcc.target/i386/avx10_2-bf-vector-smaxmin-1.c: New test.
---
gcc/config/i3
From: Levy Hsu
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_use_mask_cmp_p): Add BFmode
for int mask cmp.
* config/i386/sse.md (vec_cmp): New
vec_cmp expand for VBF modes.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx10_2-512-bf-vector-cmpp-1.c:
From: Levy Hsu
AVX10.2 introduces several non-exception instructions for BF16 vector.
Enable vectorized BF add/sub/mul/div operation by supporting standard
optab for them.
gcc/ChangeLog:
* config/i386/sse.md (div3): New expander for BFmode div.
(VF_BHSD): New mode iterator with
Hi all,
Sorry for the disturb since I mis-typoed gcc-patches to gcc-patchs, resend
the patch.
This patch will add documentation for recent update in x86-64 backend.
Ok for wwwdocs trunk?
Thx,
Haochen
---
Mention AVX10.2 support and Xeon Phi removal in GCC 15.
---
htdocs/gcc-15/changes.html
Hi all,
For prefetchi instructions, RIP-relative address is explicitly mentioned
for operand and assembler obeys that rule strictly. This makes
instruction like:
prefetchit0 bar
got illegal for assembler, which should be a broad usage for prefetchi.
Explicitly add (%rip) after funct
Hi all,
I tested with %a and it works. Therefore I suppose it is a better solution.
Bootstrapped and regtested on x86-64-pc-linux-gnu. Ok for trunk and backport
to GCC 13 and 14?
Thx,
Haochen
---
Changes in v2: Use %a in pattern
---
For prefetchi instructions, RIP-relative address is explici
Hi all,
There are several typo in AVX512 intrins macro define. They will eventually
result in errors with -O0. This patch will fix that.
Bootstrapped on x86-64-pc-linux-gnu. Ok for trunk and backport to GCC14,
GCC 13 and GCC 12?
Thx,
Haochen
gcc/ChangeLog:
* config/i386/avx512dqintrin.
Hi all,
Under -O0, with the "newly" introduced intrins, the variable will be
transformed as mem instead of the origin symbol_ref. The compiler will
then treat the operand as invalid and turn the operation into nop, which
is not expected. Use macro for non-optimize to keep the variable as
symbol_re
Hi all,
I have added related testcases into the patch.
Ok for trunk and backport to GCC 14, GCC 13 and GCC 12?
Thx,
Haochen
---
Changes in v2: Add related testcases
---
There are several typo in AVX512 intrins macro define. Correct them to solve
errors when compiled with -O0.
gcc/ChangeLog
Hi all,
I added related O0 testcase in this patch.
Ok for trunk and backport to GCC 14 and GCC 13?
Thx,
Haochen
---
Changes in v2: Add testcases.
---
Under -O0, with the "newly" introduced intrins, the variable will be
transformed as mem instead of the origin symbol_ref. The compiler will
th
Hi all,
In GCC13/12, there is no _mm_avx512_setzero_ps/d since it is introduced
in GCC14.
Fix the backport issue as obvious in:
https://gcc.gnu.org/pipermail/gcc-regression/2024-July/080385.html
Thx,
Haochen
gcc/ChangeLog:
* config/i386/avx512dqintrin.h (_mm_reduce_round_sd): Use
Hi all,
AVX10.2 tech details has been just published on July 31st in the
following link:
https://cdrdv2.intel.com/v1/dl/getContent/828965
For new features and instructions, we could divide them into two parts.
One is ymm rounding control, the other is the new instructions.
In the following week
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (get_available_features): Handle
avx10.2.
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AVX10_2_256_SET): New.
(OPTION_MASK_ISA2_AVX10_2_512_SET): Ditto.
(OPTION_MASK_ISA2_AVX10_1_256_UNSET):
Hi all,
When I am checking GCC14 documentation, I found that MCore forgot to uncomment
the title for their part, which caused the documentation is mixed with x86.
Uncomment that and commit as obvious.
Thx,
Haochen
---
htdocs/gcc-14/changes.html | 2 +-
1 file changed, 1 insertion(+), 1 deletio
Hi all,
This patch fixes an bug in mapping which caused auto dispatch failed.
Sierra Forest is in processor_types enum, but not processor_subtypes.
Committed as obvious and backport to GCC13.
Thx,
Haochen
gcc/ChangeLog:
* common/config/i386/i386-common.cc (processor_alias_table):
Hi all,
When we are using -mavx10.1-256 in command line and avx10.1-256 in
target attribute together, zmm should never be generated. But current
GCC will generate zmm since it wrongly enables EVEX512 for non-explicitly
set AVX512. This patch will fix that issue.
Regtested on x86_64-pc-linux-gnu.
Hi all,
The array index should not be over 8 for v8hi, or it will fail
under -O0 or using -fstack-protector.
This patch aims to fix that, which is mentioned in PR110621.
Commit as obvious and backport to GCC13.
Thx,
Haochen
gcc/testsuite/ChangeLog:
PR target/110621
* gcc.targe
1 - 100 of 279 matches
Mail list logo