https://gcc.gnu.org/g:affd77d3fe7bfb525b3fb23316d164e847ed02d1
commit r15-167-gaffd77d3fe7bfb525b3fb23316d164e847ed02d1
Author: liuhongt
Date: Wed Mar 27 08:20:13 2024 +0800
Update libbid according to the latest Intel Decimal Floating-Point Math
Library.
The Intel Decimal Floatin
https://gcc.gnu.org/g:fa911365490a7ca308878517a4af6189ffba7ed6
commit r15-235-gfa911365490a7ca308878517a4af6189ffba7ed6
Author: liuhongt
Date: Wed Dec 20 11:43:25 2023 +0800
Support dot_prod optabs for 64-bit vector.
gcc/ChangeLog:
PR target/113079
* c
https://gcc.gnu.org/g:8b974f54393ab2d2d16a0051a68c155455a92aad
commit r15-236-g8b974f54393ab2d2d16a0051a68c155455a92aad
Author: liuhongt
Date: Mon Jan 8 15:13:41 2024 +0800
Extend usdot_prodv*qi with vpmaddwd when AVXVNNI/AVX512VNNI is not
available.
gcc/ChangeLog:
https://gcc.gnu.org/g:a9f642783853b60bb0a59562b8ab3ed10ec01641
commit r15-234-ga9f642783853b60bb0a59562b8ab3ed10ec01641
Author: liuhongt
Date: Wed Dec 20 11:54:43 2023 +0800
Optimize 64-bit vector permutation with punpcklqdq + 128-bit vector pshuf.
gcc/ChangeLog:
https://gcc.gnu.org/g:a71f90c5a7ae2942083921033cb23dcd63e70525
commit r15-499-ga71f90c5a7ae2942083921033cb23dcd63e70525
Author: Levy Hsu
Date: Thu May 9 16:50:56 2024 +0800
x86: Add 3-instruction subroutine vector shift for V16QI in
ix86_expand_vec_perm_const_1 [PR107563]
Hi All
https://gcc.gnu.org/g:0cc0956b3bb8bcbc9196075b9073a227d799e042
commit r15-529-g0cc0956b3bb8bcbc9196075b9073a227d799e042
Author: liuhongt
Date: Tue May 14 18:39:54 2024 +0800
Optimize ashift >> 7 to vpcmpgtb for vector int8.
Since there is no corresponding instruction, the shift op
https://gcc.gnu.org/g:090714e6cf8029f4ff8883dce687200024adbaeb
commit r15-530-g090714e6cf8029f4ff8883dce687200024adbaeb
Author: liuhongt
Date: Wed May 15 10:56:24 2024 +0800
Set d.one_operand_p to true when TARGET_SSSE3 in
ix86_expand_vecop_qihi_partial.
pshufb is available under
https://gcc.gnu.org/g:0ebaffccb294d90184ad78367de66b6307de3ac0
commit r15-717-g0ebaffccb294d90184ad78367de66b6307de3ac0
Author: liuhongt
Date: Fri Mar 22 14:40:00 2024 +0800
Use pblendw instead of pand to clear upper 16 bits.
For vec_pack_truncv8si/v4si w/o AVX512,
(const_vect
https://gcc.gnu.org/g:bb42c551905024ea23095a0eb7b58fdbcfbcaef6
commit r15-3058-gbb42c551905024ea23095a0eb7b58fdbcfbcaef6
Author: liuhongt
Date: Tue Aug 20 14:41:00 2024 +0800
Align predicates for operands[1] between mov and *mov_internal.
> It's not obvious to me why movv16qi req
https://gcc.gnu.org/g:6ea25c041964bf63014fcf7bb68fb1f5a0a4e123
commit r15-3078-g6ea25c041964bf63014fcf7bb68fb1f5a0a4e123
Author: liuhongt
Date: Thu Aug 15 12:54:07 2024 +0800
Align ix86_{move_max,store_max} with vectorizer.
When none of mprefer-vector-width, avx256_optimal/avx128_
https://gcc.gnu.org/g:27dc1533b6dfc49f3912c524db51d6c372a5ac3d
commit r14-10608-g27dc1533b6dfc49f3912c524db51d6c372a5ac3d
Author: liuhongt
Date: Thu Aug 15 12:54:07 2024 +0800
Align ix86_{move_max,store_max} with vectorizer.
When none of mprefer-vector-width, avx256_optimal/avx128
https://gcc.gnu.org/g:aea374238cec1a1e53fb79575d2f998e16926999
commit r13-8987-gaea374238cec1a1e53fb79575d2f998e16926999
Author: liuhongt
Date: Thu Aug 15 12:54:07 2024 +0800
Align ix86_{move_max,store_max} with vectorizer.
When none of mprefer-vector-width, avx256_optimal/avx128_
https://gcc.gnu.org/g:b4bc34db3f2948e37ad55a09870635e88c54c7d3
commit r12-10682-gb4bc34db3f2948e37ad55a09870635e88c54c7d3
Author: liuhongt
Date: Thu Aug 15 12:54:07 2024 +0800
Align ix86_{move_max,store_max} with vectorizer.
When none of mprefer-vector-width, avx256_optimal/avx128
https://gcc.gnu.org/g:ea9c508927ec032c6d67a24df59ffa429e4d3d95
commit r13-8988-gea9c508927ec032c6d67a24df59ffa429e4d3d95
Author: liuhongt
Date: Thu Aug 22 14:31:40 2024 +0800
Fix testcase failure.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pieces-memcpy-10.c: Use
https://gcc.gnu.org/g:141d8aa375ea32c05f0d437828e6a76f1a3ea4af
commit r12-10683-g141d8aa375ea32c05f0d437828e6a76f1a3ea4af
Author: liuhongt
Date: Thu Aug 22 14:31:40 2024 +0800
Fix testcase failure.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pieces-memcpy-10.c: Use
https://gcc.gnu.org/g:ab214ef734bfc3dcffcf79ff9e1dd651c2b40566
commit r15-3314-gab214ef734bfc3dcffcf79ff9e1dd651c2b40566
Author: liuhongt
Date: Thu Aug 29 11:39:20 2024 +0800
Check avx upper register for parallel.
For function arguments/return, when it's BLK mode, it's put in a
https://gcc.gnu.org/g:ba9a3f105ea552a22d08f2d54dfdbef16af7c99e
commit r14-10625-gba9a3f105ea552a22d08f2d54dfdbef16af7c99e
Author: liuhongt
Date: Thu Aug 29 11:39:20 2024 +0800
Check avx upper register for parallel.
For function arguments/return, when it's BLK mode, it's put in a
https://gcc.gnu.org/g:5e049ada87842947adaca5c607516396889f64d6
commit r13-8999-g5e049ada87842947adaca5c607516396889f64d6
Author: liuhongt
Date: Thu Aug 29 11:39:20 2024 +0800
Check avx upper register for parallel.
For function arguments/return, when it's BLK mode, it's put in a
https://gcc.gnu.org/g:6585b06303d8fd9da907f443fc0da9faed303712
commit r12-10694-g6585b06303d8fd9da907f443fc0da9faed303712
Author: liuhongt
Date: Thu Aug 29 11:39:20 2024 +0800
Check avx upper register for parallel.
For function arguments/return, when it's BLK mode, it's put in a
https://gcc.gnu.org/g:a51f2fc0d80869ab079a93cc3858f24a1fd28237
commit r15-3498-ga51f2fc0d80869ab079a93cc3858f24a1fd28237
Author: liuhongt
Date: Wed Sep 4 15:39:17 2024 +0800
Handle const0_operand for *avx2_pcmp3_1.
*_eq3_1 supports
nonimm_or_0_operand for op1 and op2, pass_com
https://gcc.gnu.org/g:c726a6643125a59e2ba6f992924a2d0098104578
commit r15-3558-gc726a6643125a59e2ba6f992924a2d0098104578
Author: liuhongt
Date: Fri Sep 6 15:03:16 2024 +0800
Don't force_reg operands[3] when it's not const0_rtx.
It fix the regression by
a51f2fc0d80869ab079
https://gcc.gnu.org/g:f80e4ba94e41410219bdcdb1a0f204ea3f148666
commit r15-3579-gf80e4ba94e41410219bdcdb1a0f204ea3f148666
Author: liuhongt
Date: Tue Sep 10 15:04:58 2024 +0800
Enable tune fuse_move_and_alu for GNR.
According to Intel Software Optimization Manual[1], the Redwood cov
https://gcc.gnu.org/g:aac00d09859cc5934bd0f7493d537b8430337773
commit r15-1638-gaac00d09859cc5934bd0f7493d537b8430337773
Author: liuhongt
Date: Thu Jun 20 12:41:13 2024 +0800
Optimize a < 0 ? -1 : 0 to (signed)a >> 31.
Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31
and x
https://gcc.gnu.org/g:b8153b5417bed02f47354a14ad36100785dfdc47
commit r15-1673-gb8153b5417bed02f47354a14ad36100785dfdc47
Author: liuhongt
Date: Mon Jun 24 17:53:22 2024 +0800
Fix wrong cost of MEM when addr is a lea.
416.gamess regressed 4-6% on x86_64 since my r15-882-g1d6199e5f8
https://gcc.gnu.org/g:5e1a9f4ccff390ae79a9b9d0d39b325f2b4ea925
commit r15-1733-g5e1a9f4ccff390ae79a9b9d0d39b325f2b4ea925
Author: liuhongt
Date: Wed Jun 26 11:17:46 2024 +0800
Define mask as extern instead of uninitialized local variables.
The testcases are supposed to scan for vpo
https://gcc.gnu.org/g:8e1fa107a63b2e160b6bf69de4fe163dd3cebd80
commit r15-1734-g8e1fa107a63b2e160b6bf69de4fe163dd3cebd80
Author: liuhongt
Date: Wed Jun 26 13:07:31 2024 +0800
Extend lshifrtsi3_1_zext to ?k alternative.
late_combine will combine lshift + zero into *lshifrtsi3_1_zex
https://gcc.gnu.org/g:e62ea4fb8ffcab06ddd02f26db91b29b7270743f
commit r15-1735-ge62ea4fb8ffcab06ddd02f26db91b29b7270743f
Author: liuhongt
Date: Wed Jun 26 13:52:24 2024 +0800
Enable flate-combine.
Move pass_stv2 and pass_rpad after pre_reload pass_late_combine, also
define tar
https://gcc.gnu.org/g:2e2dfa0095c3326a0a5fc2ff175918b42eeb044f
commit r15-1736-g2e2dfa0095c3326a0a5fc2ff175918b42eeb044f
Author: liuhongt
Date: Mon Jun 17 17:16:46 2024 +0800
Add more splitters to match (unspec [op1 op2 (gt op3 constm1_operand)]
UNSPEC_BLENDV)
These define_insn_a
https://gcc.gnu.org/g:b06a108f0fbffe12493b527224f6e4131a72beac
commit r15-1737-gb06a108f0fbffe12493b527224f6e4131a72beac
Author: liuhongt
Date: Tue Jun 18 14:03:42 2024 +0800
Lower AVX512 kmask comparison back to AVX2 comparison when op_{true,false}
is vector -1/0.
gcc/ChangeLog
https://gcc.gnu.org/g:3cb204046c0db899750aee9480af4f1953a40ac3
commit r15-1739-g3cb204046c0db899750aee9480af4f1953a40ac3
Author: liuhongt
Date: Wed Jun 19 13:12:00 2024 +0800
Add more splitter for mskmov with avx512 comparison.
gcc/ChangeLog:
PR target/115517
https://gcc.gnu.org/g:e94e6ee495d95f29355bbc017214228a5e367638
commit r15-1740-ge94e6ee495d95f29355bbc017214228a5e367638
Author: liuhongt
Date: Wed Jun 19 16:05:58 2024 +0800
Adjust testcase for the regressed testcases after obsolete of vcond{,u,eq}.
> Richard suggests that we imp
https://gcc.gnu.org/g:09737d9605521df9232d9990006c44955064f44e
commit r15-1738-g09737d9605521df9232d9990006c44955064f44e
Author: liuhongt
Date: Tue Jun 18 15:52:02 2024 +0800
Match IEEE min/max with UNSPEC_IEEE_{MIN,MAX}.
These versions of the min/max patterns implement exactly th
https://gcc.gnu.org/g:2ccdd0f22312a14ac64bf944fdc4f8e7532eb0eb
commit r15-1741-g2ccdd0f22312a14ac64bf944fdc4f8e7532eb0eb
Author: liuhongt
Date: Thu Jun 20 12:41:13 2024 +0800
Optimize a < 0 ? -1 : 0 to (signed)a >> 31.
Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31
and x
https://gcc.gnu.org/g:55f80c690c5fa59836646565a9dee2a3f68374a0
commit r15-1742-g55f80c690c5fa59836646565a9dee2a3f68374a0
Author: liuhongt
Date: Mon Jun 24 09:19:01 2024 +0800
Remove vcond{,u,eq} expanders since they will be obsolete.
gcc/ChangeLog:
PR target/11551
https://gcc.gnu.org/g:239ad907b1fc08874042f8bea5f61eaf3ba2877d
commit r15-1806-g239ad907b1fc08874042f8bea5f61eaf3ba2877d
Author: liuhongt
Date: Wed Jul 3 14:47:33 2024 +0800
Move runtime check into a separate function and guard it with target
("no-avx")
The patch can avoid SIGILL
https://gcc.gnu.org/g:699087a16591adfdf21228876b6c48dbcd353faa
commit r15-1836-g699087a16591adfdf21228876b6c48dbcd353faa
Author: liuhongt
Date: Thu Jul 4 13:57:32 2024 +0800
Use __builtin_cpu_support instead of __get_cpuid_count.
gcc/testsuite/ChangeLog:
PR target
https://gcc.gnu.org/g:a910c30c7c27cd0f6d2d2694544a09fb11d611b9
commit r15-1888-ga910c30c7c27cd0f6d2d2694544a09fb11d611b9
Author: H.J. Lu
Date: Tue Apr 26 11:08:55 2022 -0700
x86: Update branch hint for Redwood Cove.
According to IntelĀ® 64 and IA-32 Architectures Optimization Refer
https://gcc.gnu.org/g:23ab7f632f4f5bae67fb53cf7b18fea7ba7242c4
commit r15-1905-g23ab7f632f4f5bae67fb53cf7b18fea7ba7242c4
Author: liuhongt
Date: Mon Jul 8 10:35:35 2024 +0800
Rename __{float,double}_u to __x86_{float,double}_u to avoid pulluting the
namespace.
I have a build failu
https://gcc.gnu.org/g:e1427b39d28f382d21e7a0ea1714b3250e0a6e5d
commit r12-10617-ge1427b39d28f382d21e7a0ea1714b3250e0a6e5d
Author: liuhongt
Date: Fri Jul 12 09:39:23 2024 +0800
Fix SSA_NAME leak due to def_stmt is removed before use_stmt.
- _5 = __atomic_fetch_or_8 (&set_work_pend
https://gcc.gnu.org/g:9a1cdaa5e8441394d613f5f3401e7aab21efe8f0
commit r13-8913-g9a1cdaa5e8441394d613f5f3401e7aab21efe8f0
Author: liuhongt
Date: Fri Jul 12 09:39:23 2024 +0800
Fix SSA_NAME leak due to def_stmt is removed before use_stmt.
- _5 = __atomic_fetch_or_8 (&set_work_pendi
https://gcc.gnu.org/g:13bfc385b0baebd22aeabb0d90915f2e9b18febe
commit r14-10422-g13bfc385b0baebd22aeabb0d90915f2e9b18febe
Author: liuhongt
Date: Fri Jul 12 09:39:23 2024 +0800
Fix SSA_NAME leak due to def_stmt is removed before use_stmt.
- _5 = __atomic_fetch_or_8 (&set_work_pend
https://gcc.gnu.org/g:f27bf48e0204524ead795fe618cd8b1224f72fd4
commit r15-2038-gf27bf48e0204524ead795fe618cd8b1224f72fd4
Author: liuhongt
Date: Fri Jul 12 09:39:23 2024 +0800
Fix SSA_NAME leak due to def_stmt is removed before use_stmt.
- _5 = __atomic_fetch_or_8 (&set_work_pendi
https://gcc.gnu.org/g:1fff665a51e221a578a92631fc8ea62dd79fa3b6
commit r14-10425-g1fff665a51e221a578a92631fc8ea62dd79fa3b6
Author: H.J. Lu
Date: Tue Apr 26 11:08:55 2022 -0700
x86: Update branch hint for Redwood Cove.
According to IntelĀ® 64 and IA-32 Architectures Optimization Refe
https://gcc.gnu.org/g:228972b2b7bf50f4776f8ccae0d7c2950827d0f1
commit r15-2127-g228972b2b7bf50f4776f8ccae0d7c2950827d0f1
Author: liuhongt
Date: Tue Jul 16 15:29:01 2024 +0800
Optimize maskstore when mask is 0 or -1 in UNSPEC_MASKMOV
gcc/ChangeLog:
PR target/115843
https://gcc.gnu.org/g:a3f03891065cb9691f6e9cebce4d4542deb92a35
commit r15-2217-ga3f03891065cb9691f6e9cebce4d4542deb92a35
Author: liuhongt
Date: Mon Jul 22 11:36:59 2024 +0800
Relax ix86_hardreg_mov_ok after split1.
ix86_hardreg_mov_ok is added by r11-5066-gbe39636d9f68c4
https://gcc.gnu.org/g:618e34d56cc38e9c3ae95a413228068e53ed76bb
commit r14-9459-g618e34d56cc38e9c3ae95a413228068e53ed76bb
Author: liuhongt
Date: Wed Mar 13 10:40:01 2024 +0800
i386[stv]: Handle REG_EH_REGION note
When we split
(insn 37 36 38 10 (set (reg:DI 104 [ _18 ])
https://gcc.gnu.org/g:bdbcfbfcf591381f0faf95c881e3772b56d0a404
commit r13-8438-gbdbcfbfcf591381f0faf95c881e3772b56d0a404
Author: liuhongt
Date: Wed Mar 13 10:40:01 2024 +0800
i386[stv]: Handle REG_EH_REGION note
When we split
(insn 37 36 38 10 (set (reg:DI 104 [ _18 ])
https://gcc.gnu.org/g:a861f940efffae2782c559cd04df2d2740cd28bd
commit r12-10214-ga861f940efffae2782c559cd04df2d2740cd28bd
Author: liuhongt
Date: Wed Mar 13 10:40:01 2024 +0800
i386[stv]: Handle REG_EH_REGION note
When we split
(insn 37 36 38 10 (set (reg:DI 104 [ _18 ])
https://gcc.gnu.org/g:942d470a5a4fb1baeff943127a81b441dffaa543
commit r14-9512-g942d470a5a4fb1baeff943127a81b441dffaa543
Author: liuhongt
Date: Fri Mar 15 10:59:10 2024 +0800
Add missing hf/bf patterns.
It will be used by copysignm3/xorsignm3/lroundmn2 expanders.
gcc/Chan
https://gcc.gnu.org/g:415091f09096a0ebba1fdcd4af8c2fda24cfd411
commit r14-9588-g415091f09096a0ebba1fdcd4af8c2fda24cfd411
Author: liuhongt
Date: Mon Mar 18 18:53:59 2024 +0800
Document -fexcess-precision=16.
gcc/ChangeLog:
PR middle-end/114347
* doc/inv
https://gcc.gnu.org/g:ac2f8c2a367151fc0410f904339c475a953cffc8
commit r14-9591-gac2f8c2a367151fc0410f904339c475a953cffc8
Author: liuhongt
Date: Thu Mar 21 13:15:23 2024 +0800
Fix runtime error for nonlinear iv vectorization(step_mult).
wi::from_mpz doesn't take a sign argument, we
https://gcc.gnu.org/g:199b021a38f30b681e0dbecd2d0296beabd50b13
commit r13-8475-g199b021a38f30b681e0dbecd2d0296beabd50b13
Author: liuhongt
Date: Thu Mar 21 13:15:23 2024 +0800
Fix runtime error for nonlinear iv vectorization(step_mult).
wi::from_mpz doesn't take a sign argument, we
https://gcc.gnu.org/g:9a6c7aa1b011b77fcd9b19f7b8d7ff0fc823cdb2
commit r14-9603-g9a6c7aa1b011b77fcd9b19f7b8d7ff0fc823cdb2
Author: liuhongt
Date: Fri Mar 22 10:09:43 2024 +0800
Move pr114396.c from gcc.target/i386 to gcc.c-torture/execute.
Also fixed a typo in the testcase.
https://gcc.gnu.org/g:e6a3d1f5bcfd954b614155d96c97bde8ac230e2e
commit r13-8488-ge6a3d1f5bcfd954b614155d96c97bde8ac230e2e
Author: liuhongt
Date: Fri Mar 22 10:09:43 2024 +0800
Move pr114396.c from gcc.target/i386 to gcc.c-torture/execute.
Also fixed a typo in the testcase.
https://gcc.gnu.org/g:c19a674d03847b900919b97d0957c8ae5164f8f1
commit r15-22-gc19a674d03847b900919b97d0957c8ae5164f8f1
Author: liuhongt
Date: Tue Apr 16 08:37:22 2024 +0800
Adjust alternative *k to ?k for avx512 mask in zero_extend patterns
So when both source operand and dest ope
https://gcc.gnu.org/g:bc1fda00d5f20e2f3e77a50b2822562b6e0040b2
commit r15-2395-gbc1fda00d5f20e2f3e77a50b2822562b6e0040b2
Author: liuhongt
Date: Wed Jul 24 11:29:23 2024 +0800
Refine constraint "Bk" to define_special_memory_constraint.
For below pattern, RA may still allocate r162
https://gcc.gnu.org/g:64ca25aec4939aea79bd812b089fbb666ca6f2fd
commit r15-2539-g64ca25aec4939aea79bd812b089fbb666ca6f2fd
Author: liuhongt
Date: Fri Jul 26 09:56:03 2024 +0800
Fix mismatch between constraint and predicate for ashl3_doubleword.
(insn 98 94 387 2 (parallel [
https://gcc.gnu.org/g:a295076bee293aa3112c615f9af7a27231816a36
commit r14-10551-ga295076bee293aa3112c615f9af7a27231816a36
Author: liuhongt
Date: Wed Jul 24 11:29:23 2024 +0800
Refine constraint "Bk" to define_special_memory_constraint.
For below pattern, RA may still allocate r162
https://gcc.gnu.org/g:c94738e2462ff46f3013f6270f6a955b749d82b2
commit r12-10668-gc94738e2462ff46f3013f6270f6a955b749d82b2
Author: liuhongt
Date: Wed Jul 24 11:29:23 2024 +0800
Refine constraint "Bk" to define_special_memory_constraint.
For below pattern, RA may still allocate r162
https://gcc.gnu.org/g:617562e4e422c7bd282960b14abfffd994445009
commit r13-8971-g617562e4e422c7bd282960b14abfffd994445009
Author: liuhongt
Date: Wed Jul 24 11:29:23 2024 +0800
Refine constraint "Bk" to define_special_memory_constraint.
For below pattern, RA may still allocate r162
https://gcc.gnu.org/g:c3c83d22d212a35cb1bfb8727477819463f0dcd8
commit r15-2906-gc3c83d22d212a35cb1bfb8727477819463f0dcd8
Author: liuhongt
Date: Mon Aug 12 14:35:31 2024 +0800
Move ix86_align_loops into a separate pass and insert the pass after
pass_endbr_and_patchable_area.
gcc/C
https://gcc.gnu.org/g:f7e672da8fc3d416a6d07eb01f3be4400ef94fac
commit r15-2930-gf7e672da8fc3d416a6d07eb01f3be4400ef94fac
Author: liuhongt
Date: Mon Aug 12 18:24:34 2024 +0800
Movement between GENERAL_REGS and SSE_REGS for TImode doesn't need
secondary reload.
It results in 2 fail
https://gcc.gnu.org/g:4e7735a8d87559bbddfe3a985786996e22241f8d
commit r14-10588-g4e7735a8d87559bbddfe3a985786996e22241f8d
Author: liuhongt
Date: Mon Aug 12 14:35:31 2024 +0800
Move ix86_align_loops into a separate pass and insert the pass after
pass_endbr_and_patchable_area.
gcc/
https://gcc.gnu.org/g:51f4b47c4f4f61fe31a7bd1fa80e08c2438d76a8
commit r15-814-g51f4b47c4f4f61fe31a7bd1fa80e08c2438d76a8
Author: liuhongt
Date: Fri May 24 09:49:08 2024 +0800
Fix typo in the testcase.
gcc/testsuite/ChangeLog:
PR target/114148
* gcc.targ
https://gcc.gnu.org/g:c65002347e595cda8b15e59e734d209283faf2b6
commit r15-857-gc65002347e595cda8b15e59e734d209283faf2b6
Author: liuhongt
Date: Tue May 28 10:32:12 2024 +0800
Fix predicate mismatch between vfcmaddcph's define_insn and define_expand.
When I applied Roger's patch [1]
https://gcc.gnu.org/g:1d6199e5f8c1c08083eeb0279f71333234fe14ad
commit r15-882-g1d6199e5f8c1c08083eeb0279f71333234fe14ad
Author: liuhongt
Date: Mon Feb 19 13:57:24 2024 +0800
Reduce cost of MEM (A + imm).
For MEM, rtx_cost iterates each subrtx, and adds up the costs,
so for MEM
https://gcc.gnu.org/g:ef27b91b62c3aa8841c02665dffa8914c742fd37
commit r15-919-gef27b91b62c3aa8841c02665dffa8914c742fd37
Author: liuhongt
Date: Tue Feb 27 15:34:57 2024 +0800
Don't reduce estimated unrolled size for innermost loop.
For the innermost loop, after completely loop unro
https://gcc.gnu.org/g:b6c6d5abf0d31c936f50f8f9073c5e335b9e24b7
commit r15-920-gb6c6d5abf0d31c936f50f8f9073c5e335b9e24b7
Author: liuhongt
Date: Wed Feb 28 11:17:10 2024 +0800
Support vcond_mask_qiqi and friends.
gcc/ChangeLog:
* config/i386/sse.md (vcond_mask_): Ne
https://gcc.gnu.org/g:3a873c0a7bc8183de95a6103b507101a25eed413
commit r15-932-g3a873c0a7bc8183de95a6103b507101a25eed413
Author: liuhongt
Date: Thu May 30 14:15:48 2024 +0800
Rename double_u with __double_u to avoid pulluting the namespace.
gcc/ChangeLog:
* config/
https://gcc.gnu.org/g:ac306de7d5100d3682eae2270995a9abbe19db38
commit r15-984-gac306de7d5100d3682eae2270995a9abbe19db38
Author: liuhongt
Date: Fri May 31 14:38:07 2024 +0800
Add some preference for floating point rtl ifcvt when sse4.1 is not
available
W/o TARGET_SSE4_1, it takes
https://gcc.gnu.org/g:4d207044195b97ecb27c72a7dc987eb8b86644a0
commit r15-1003-g4d207044195b97ecb27c72a7dc987eb8b86644a0
Author: liuhongt
Date: Tue Jun 4 10:13:09 2024 +0800
Adjust testcase for -march=cascadelake
gcc/testsuite/ChangeLog:
PR target/115299
https://gcc.gnu.org/g:b05288d1f1e4b632eddf8830b4369d4659f6c2ff
commit r15-1022-gb05288d1f1e4b632eddf8830b4369d4659f6c2ff
Author: liuhongt
Date: Tue May 21 16:57:17 2024 +0800
Don't simplify NAN/INF or out-of-range constant for FIX/UNSIGNED_FIX.
According to IEEE standard, for conv
https://gcc.gnu.org/g:7876cde25cbd2f026a0ae488e5263e72f8e9bfa0
commit r15-1047-g7876cde25cbd2f026a0ae488e5263e72f8e9bfa0
Author: liuhongt
Date: Fri Apr 19 10:29:34 2024 +0800
Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode.
When mask is (1 << (prec - imm)
https://gcc.gnu.org/g:961dd0d635217c703a38c48903981e0d60962546
commit r15-1048-g961dd0d635217c703a38c48903981e0d60962546
Author: liuhongt
Date: Fri Apr 19 10:39:53 2024 +0800
Adjust rtx_cost for MEM to enable more simplication
For CONST_VECTOR_DUPLICATE_P in constant_pool, it is j
https://gcc.gnu.org/g:fcfce55c85f842ed843cbc4aabe744c6a004dead
commit r15-1050-gfcfce55c85f842ed843cbc4aabe744c6a004dead
Author: liuhongt
Date: Thu Jun 6 11:27:53 2024 +0800
Refine testcase for power10.
For power10, there're extra 3 REG_EQUIV notes with (fix:SI. to avoid
the f
https://gcc.gnu.org/g:b24f2954dbc13d85e9fb62e05a88e9df21e4d4f4
commit r15-1088-gb24f2954dbc13d85e9fb62e05a88e9df21e4d4f4
Author: liuhongt
Date: Fri Jun 7 09:29:24 2024 +0800
Add additional option --param max-completely-peeled-insns=200 for
power64*-*-*
gcc/testsuite/ChangeLog:
https://gcc.gnu.org/g:e4f85ea6271a10e13c6874709a05e04ab0508fbf
commit r13-8825-ge4f85ea6271a10e13c6874709a05e04ab0508fbf
Author: Jan Hubicka
Date: Fri Dec 29 23:51:03 2023 +0100
Disable FMADD in chains for Zen4 and generic
this patch disables use of FMA in matrix multiplication lo
https://gcc.gnu.org/g:5d52558a531130675329d72ca5c4713abf5bf885
commit r12-10497-g5d52558a531130675329d72ca5c4713abf5bf885
Author: Jan Hubicka
Date: Fri Dec 29 23:51:03 2023 +0100
Disable FMADD in chains for Zen4 and generic
this patch disables use of FMA in matrix multiplication l
https://gcc.gnu.org/g:1d496d2cd1d5d8751a1637abca89339d6f9ddd3b
commit r15-1191-g1d496d2cd1d5d8751a1637abca89339d6f9ddd3b
Author: liuhongt
Date: Tue Jun 11 10:23:27 2024 +0800
Fix ICE in rtl check due to CONST_WIDE_INT in CONST_VECTOR_DUPLICATE_P
The patch add extra check to make s
https://gcc.gnu.org/g:f8bf80a4e1682b2238baad8c44939682f96b1fe0
commit r15-1234-gf8bf80a4e1682b2238baad8c44939682f96b1fe0
Author: liuhongt
Date: Thu Jun 13 09:53:58 2024 +0800
Fix ICE due to REGNO of a SUBREG.
Use reg_or_subregno instead.
gcc/ChangeLog:
PR
https://gcc.gnu.org/g:8b69efd9819f86b973d7a550e987ce455fce6d62
commit r15-1307-g8b69efd9819f86b973d7a550e987ce455fce6d62
Author: liuhongt
Date: Mon Jun 3 10:38:19 2024 +0800
Remove one_if_conv for latest Intel processors.
The tune is added by PR79390 for SciMark2 on Broadwell.
https://gcc.gnu.org/g:d3fae2bea034edb001cd45d1d86c5ceef146899b
commit r15-1308-gd3fae2bea034edb001cd45d1d86c5ceef146899b
Author: liuhongt
Date: Tue Jun 11 21:22:42 2024 +0800
Adjust ix86_rtx_costs for pternlog_operand_p.
r15-1100-gec985bc97a0157 improves handling of ternlog instru
https://gcc.gnu.org/g:4c957d7ba84d8bbce6e778048f38e92ef71806c8
commit r15-1563-g4c957d7ba84d8bbce6e778048f38e92ef71806c8
Author: Collin Funk
Date: Mon Jun 10 06:36:47 2024 +
AVX-512: Pacify -Wshift-overflow=2. [PR115409]
A shift of 31 on a signed int is undefined behavior. Si
https://gcc.gnu.org/g:fe0692f689a18c432d6f59f404d4cd020cbebef2
commit r14-10782-gfe0692f689a18c432d6f59f404d4cd020cbebef2
Author: liuhongt
Date: Tue Sep 24 15:53:14 2024 +0800
Add new microarchitecture tune for SRF/GRR/CWF.
For Crestmont, 4-operand vex blendv instructions come fro
https://gcc.gnu.org/g:9b7d5ecbecfbd193899648e411f1a9b2a77471e2
commit r14-10783-g9b7d5ecbecfbd193899648e411f1a9b2a77471e2
Author: liuhongt
Date: Wed Sep 25 13:11:11 2024 +0800
Add a new tune avx256_avoid_vec_perm for SRF.
According to Intel SOM[1], For Crestmont, most 256-bit Int
https://gcc.gnu.org/g:e9eadc29c1c57cd7be9ec8de231d8fb9e8ac0c7c
commit r13-9117-ge9eadc29c1c57cd7be9ec8de231d8fb9e8ac0c7c
Author: liuhongt
Date: Tue Sep 24 15:53:14 2024 +0800
Add new microarchitecture tune for SRF/GRR/CWF.
For Crestmont, 4-operand vex blendv instructions come from
https://gcc.gnu.org/g:eecd5f8ce1729a214bf0a1edfdd3ee1cf79be881
commit r13-9118-geecd5f8ce1729a214bf0a1edfdd3ee1cf79be881
Author: liuhongt
Date: Wed Sep 25 13:11:11 2024 +0800
Add a new tune avx256_avoid_vec_perm for SRF.
According to Intel SOM[1], For Crestmont, most 256-bit Inte
https://gcc.gnu.org/g:79e7e02b7cc578d03eab2b50c029f44409ef8e26
commit r14-10807-g79e7e02b7cc578d03eab2b50c029f44409ef8e26
Author: liuhongt
Date: Wed Oct 16 13:43:48 2024 +0800
Refine splitters related to "combine vpcmpuw + zero_extend to vpcmpuw"
r12-6103-g1a7ce8570997eb combines
https://gcc.gnu.org/g:5259d3927c1c8e3a15b4b844adef59b48c241233
commit r15-4510-g5259d3927c1c8e3a15b4b844adef59b48c241233
Author: liuhongt
Date: Wed Oct 16 13:43:48 2024 +0800
Refine splitters related to "combine vpcmpuw + zero_extend to vpcmpuw"
r12-6103-g1a7ce8570997eb combines v
https://gcc.gnu.org/g:fca35b417c236e3448bc3666820fd1ba423fe6e9
commit r13-9139-gfca35b417c236e3448bc3666820fd1ba423fe6e9
Author: liuhongt
Date: Wed Oct 16 13:43:48 2024 +0800
Refine splitters related to "combine vpcmpuw + zero_extend to vpcmpuw"
r12-6103-g1a7ce8570997eb combines v
https://gcc.gnu.org/g:91800a70a2af1349eefc5f3380be2b254b1db395
commit r12-10778-g91800a70a2af1349eefc5f3380be2b254b1db395
Author: liuhongt
Date: Wed Oct 16 13:43:48 2024 +0800
Refine splitters related to "combine vpcmpuw + zero_extend to vpcmpuw"
r12-6103-g1a7ce8570997eb combines
https://gcc.gnu.org/g:8b43518a01cbbbafe042b85a48fa09a32948380a
commit r13-9142-g8b43518a01cbbbafe042b85a48fa09a32948380a
Author: liuhongt
Date: Tue Oct 22 11:24:23 2024 +0800
[GCC13/GCC12] Fix testcase.
The optimization relies on other patterns which are only available at
GCC1
https://gcc.gnu.org/g:45bde60836d04cce4637b74ecadbb0aff90b832f
commit r12-10781-g45bde60836d04cce4637b74ecadbb0aff90b832f
Author: liuhongt
Date: Tue Oct 22 11:24:23 2024 +0800
[GCC13/GCC12] Fix testcase.
The optimization relies on other patterns which are only available at
GCC
https://gcc.gnu.org/g:70c3db511ba14ff5fa68cb41d0714a9fb957ea5d
commit r15-4225-g70c3db511ba14ff5fa68cb41d0714a9fb957ea5d
Author: liuhongt
Date: Mon Mar 25 21:28:14 2024 -0700
Enable vectorization for unknown tripcount in very cheap cost model but
disable epilog vectorization.
gcc
https://gcc.gnu.org/g:d5d1189c12199db79f6feb5cfcc7e6475c3a4d91
commit r15-4226-gd5d1189c12199db79f6feb5cfcc7e6475c3a4d91
Author: liuhongt
Date: Thu Sep 19 13:38:34 2024 +0800
Adjust testcase after relax O2 vectorization.
gcc/testsuite/ChangeLog:
* gcc.dg/fstack-pr
https://gcc.gnu.org/g:9eaecce3d8c1d9349adbf8c2cdaf8d87672ed29c
commit r15-4234-g9eaecce3d8c1d9349adbf8c2cdaf8d87672ed29c
Author: liuhongt
Date: Wed Sep 25 13:11:11 2024 +0800
Add a new tune avx256_avoid_vec_perm for SRF.
According to Intel SOM[1], For Crestmont, most 256-bit Inte
https://gcc.gnu.org/g:9c8cea8feb6cd54ef73113a0b74f1df7b60d09dc
commit r15-4233-g9c8cea8feb6cd54ef73113a0b74f1df7b60d09dc
Author: liuhongt
Date: Tue Sep 24 15:53:14 2024 +0800
Add new microarchitecture tune for SRF/GRR/CWF.
For Crestmont, 4-operand vex blendv instructions come from
https://gcc.gnu.org/g:ee7e77e9c121f5a6f27c92b6b24b2abf9cd66a4d
commit r15-4560-gee7e77e9c121f5a6f27c92b6b24b2abf9cd66a4d
Author: liuhongt
Date: Mon Oct 21 02:22:08 2024 -0700
i386: Optimize EQ/NE comparison between avx512 kmask and -1.
r15-974-gbf7745f887c765e06f2e75508f263debb60a
https://gcc.gnu.org/g:b718f6ec1674c0db30f26c65b7a9215e9388dd6c
commit r14-10831-gb718f6ec1674c0db30f26c65b7a9215e9388dd6c
Author: liuhongt
Date: Tue Oct 22 01:54:40 2024 -0700
Fix ICE due to isa mismatch for the builtins.
gcc/ChangeLog:
PR target/117240
https://gcc.gnu.org/g:2452387468423882c0732e0fad3a83e887574ccc
commit r13-9145-g2452387468423882c0732e0fad3a83e887574ccc
Author: liuhongt
Date: Tue Oct 22 01:54:40 2024 -0700
Fix ICE due to isa mismatch for the builtins.
gcc/ChangeLog:
PR target/117240
1 - 100 of 124 matches
Mail list logo