Sorry, I should have built the patch while backporting, and thanks for your
report and suggestions.
I'll backport another patch to fix the problems after finishing bootstraps,
probably in couple hours.
Thank you!
Lili.
> -Original Message-
> From: Jonathan Wakely
> Sent: Monday, August
/* Alder Lake. */
> case 0xb7:
> +case 0xba:
> + case 0xbf: <<<<<< Newly added same case value
> /* Raptor Lake. */
>
>
> Tobias
>
> On 29.06.23 05:06, Cui, Lili via Gcc-patches wrote:
> > I will directly commit this patch, it c
Committed as obvious, and backported to GCC13.
Lili.
Update model values for Raptorlake according to SDM.
gcc/ChangeLog
* common/config/i386/cpuinfo.h (get_intel_cpu): Add model value 0xba
to Raptorlake.
---
gcc/common/config/i386/cpuinfo.h | 1 +
1 file changed, 1 insertion(+
> -Original Message-
> From: Hongtao Liu
> Sent: Tuesday, July 4, 2023 4:27 PM
> To: Cui, Lili
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] x86: Enable ENQCMD and UINTR for march=sierraforest.
>
> On Tue, Jul 4, 2023 at 4:15 PM Cui, Lili wrote:
> >
> > From: Lili Cui
> >
> >
From: Lili Cui
Hi Maintainer,
This patch is to enable ENQCMD and UINTR for march=sierraforest according to
Intel ISE.
Bootstrapped and regtested. Ok for trunk? And I will backport this patch to
GCC13.
Thanks,
Lili.
Enable ENQCMD and UINTR for march=sierraforest according to Intel ISE
https:
> -Original Message-
> From: Richard Biener
> Sent: Thursday, June 29, 2023 2:42 PM
> To: Cui, Lili
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] PR gcc/110148:Avoid adding loop-carried ops to long
> chains
>
> On Thu, Jun 29, 2023 at 3:49 AM Cui, Lili wrote:
> >
> > From: Lili
I will directly commit this patch, it can be considered as an obvious patch.
Thanks,
Lili.
> -Original Message-
> From: Gcc-patches On
> Behalf Of Cui, Lili via Gcc-patches
> Sent: Wednesday, June 28, 2023 6:52 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao
>
From: Lili Cui
Hi Maintainer
This patch is to fix TSVC242 regression related to loop-carried ops.
Bootstrapped and regtested. Ok for trunk?
Regards
Lili.
Avoid adding loop-carried ops to long chains, otherwise the whole chain will
have dependencies across the loop iteration. Just keep loop-ca
Hi Hongtao,
This patch is to update model values for Alderlake, Rocketlake and Raptorlake
according to SDM.
Ok for trunk?
Thanks.
Lili.
Update model values for Alderlake, Rocketlake and Raptorlake according to SDM.
gcc/ChangeLog
* common/config/i386/cpuinfo.h (get_intel_cpu): Remove
Hi Di,
The compile options I use are: "-march=native -Ofast -funroll-loops -flto"
I re-ran 503, 507, and 527 on two neoverse-n1 machines, and found that one
machine fluctuated greatly, and the score was only 70% of the other machine. I
also couldn't reproduce the gain on the stable machine. For
Committed, thanks Richard.
Lili.
> -Original Message-
> From: Richard Biener
> Sent: Wednesday, May 31, 2023 3:22 PM
> To: Cui, Lili
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Fix ICE in rewrite_expr_tree_parallel
>
> On Wed, May 31, 2023 at 3:35 AM Cui, Lili wrote:
> >
> >
Hi,
This patch is to fix ICE in rewrite_expr_tree_parallel.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110038
Bootstrapped and regtested. Ok for trunk?
Regards
Lili.
1. Limit the value of tree-reassoc-width to IntegerRange(0, 256).
2. Add width limit in rewrite_expr_tree_parallel.
gcc/Change
I will rebase and commit this patch, thanks!
Lili.
> -Original Message-
> From: Cui, Lili
> Sent: Thursday, May 25, 2023 7:30 AM
> To: gcc-patches@gcc.gnu.org
> Cc: richard.guent...@gmail.com; li...@linux.ibm.com; Cui, Lili
>
> Subject: [PATCH] Handle FMA friendly in reassoc pass
>
>
> > +rewrite_expr_tree_parallel (gassign *stmt, int width, bool has_fma,
> > +const vec
> > +&ops)
> > {
> >enum tree_code opcode = gimple_assign_rhs_code (stmt);
> >int op_num = ops.length ();
> > @@ -5483,10 +5494,11 @@ rewrite_expr_tree_parallel (
From: Lili Cui
Make some changes in reassoc pass to make it more friendly to fma pass later.
Using FMA instead of mult + add reduces register pressure and insruction
retired.
There are mainly two changes
1. Put no-mult ops and mult ops alternately at the end of the queue, which is
conducive to g
Attach CPU2017 3 run results:
On ICX:
507.cactuBSSN_r: Improved by 1.7% for multi-copy .
503.bwaves_r : Improved by 0.60% for single copy .
507.cactuBSSN_r : Improved by 1.10% for single copy .
519.lbm_r : Improved by 2.21% for single copy .
no measurable changes for other ben
> I think to make a difference you need to hit the number of parallel fadd/fmul
> the pipeline can perform. I don't think issue width is ever a problem for
> chains w/o fma and throughput of fma vs fadd + fmul should be similar.
>
Yes, for x86 backend, fadd , fmul and fma have the same TP meanin
From: Lili Cui
Make some changes in reassoc pass to make it more friendly to fma pass later.
Using FMA instead of mult + add reduces register pressure and insruction
retired.
There are mainly two changes
1. Put no-mult ops and mult ops alternately at the end of the queue, which is
conducive to g
> ISTR there were no sufficient comments in the code explaining why
> rewrite_expr_tree_parallel_for_fma is better by design. In fact ...
>
> >
> > >
> > > > if (!reassoc_insert_powi_p
> > > > - && ops.length () > 3
> > > > + && len > 3
>
From: Lili Cui
Add a param for the chain with FMA in reassoc pass to make it more friendly to
the fma pass later. First to detect if this chain has ability to
generate more than 2 FMAs,if yes and param_reassoc_max_chain_length_with_fma
is enabled, We will rearrange the ops so that they can be com
> -Original Message-
> From: Richard Biener
> Sent: Thursday, May 11, 2023 6:53 PM
> To: Cui, Lili
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH 1/2] PR gcc/98350:Add a param to control the length of
> the chain with FMA in reassoc pass
Hi Richard,
Thanks for helping to review the
From: Lili Cui
Set the length of the chain with FMA to 5 for icelake_cost.
With this patch applied,
SPR multi-copy: 508.namd_r increased by 3%
ICX multi-copy: 508.namd_r increased by 3.5%,
507.cactuBSSN_r increased by 3.7%
Using FMA instead of mult + add reduces register pressur
From: Lili Cui
Hi,
Those two patches each add a param to control the length of the chain with
FMA in reassoc pass and a tuning option in the backend.
Bootstrapped and regtested. Ok for trunk?
Regards
Lili.
Add a param for the chain with FMA in reassoc pass to make it more friendly to
the fma
From: Lili Cui
Hi Hontao,
This patch is to enable 256 move by pieces for ALDERLAKE and AVX2.
Bootstrap is ok, and no regressions for i386/x86-64 testsuite.
OK for master?
gcc/Changelog:
* config/i386/x86-tune.def
(X86_TUNE_AVX256_MOVE_BY_PIECES): Add alderlake and avx2.
Hi Hongtao,
I backported this patch to gcc-12 release.
gcc/ChangeLog:
* config/i386/driver-i386.cc (host_detect_local_cpu):
Move sapphirerapids out of AVX512_VP2INTERSECT.
* config/i386/i386.h: Remove AVX512_VP2INTERSECT from PTA_SAPPHIRERAPIDS
* doc/invoke.tex
> > > +@item x86-vect-unroll-min-ldst-threshold
> > > +The vectorizer will check with target information to determine
> > > +whether unroll it. This parameter is used to limit the mininum of
> > > +loads and stores in the main loop.
> > >
> > > It's odd to "limit" the minimum number of something.
>
> On 10/20/22 19:52, Cui, Lili via Gcc-patches wrote:
> > Hi Honza,
> >
> > Gentle ping
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601934.html
> >
> > gcc/ChangeLog
> >
> >* ipa-inline-analysis.cc (do_estimate_e
. I think 200 still work.
> That said, the heuristic made me think "what the heck". Can we explain in u-
> arch terms why the unrolling is beneficial instead of just defering to SPEC
> CPU 2017 fotonik?
>
Regarding the benefits, I explained in the first answer, I checked 5
Hi Hongtao,
This patch introduces function finish_cost and
determine_suggested_unroll_factor for x86 backend, to make it be
able to suggest the unroll factor for a given loop being vectorized.
Referring to aarch64, RS6000 backends and basing on the analysis on
SPEC2017 performance evaluation resu
Hi Honza,
Gentle ping
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601934.html
gcc/ChangeLog
* ipa-inline-analysis.cc (do_estimate_edge_time): Add function attribute
judgement for INLINE_HINT_known_hot hint.
gcc/testsuite/ChangeLog:
* gcc.dg/ipa/inlinehint-6.c: New test.
--
atches@gcc.gnu.org
> Subject: Ping^1 [PATCH] Add attribute hot judgement for
> INLINE_HINT_known_hot hint.
>
> Hi Honza,
>
> Gentle ping https://gcc.gnu.org/pipermail/gcc-patches/2022-
> September/601934.html
>
> Thanks,
> Lili.
>
> > -Original Message--
Hi,
I want to add myself in MAINTANINER for write after approval.
OK for master?
ChangeLog:
* MAINTAINERS (Write After Approval): Add myself.
---
MAINTAINERS | 1 +
1 file changed, 1 insertion(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 11fa8bc6dbd..e4e7349a6d9 100644
--- a/MAINTA
Hi Hontao,
This patch is to remove AVX512_VP2INTERSECT from PTA_SAPPHIRERAPIDS.
The new intel ISE removes AVX512_VP2INTERSECT from SAPPHIRERAPIDS,
AVX512_VP2INTERSECT is only supportted in Tigerlake.
Hi Uros,
This patch is to remove AVX512_VP2INTERSECT from PTA_SAPPHIRERAPIDS.
The new intel ISE
Hi Honza,
Gentle ping
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601934.html
Thanks,
Lili.
> -Original Message-
> From: Gcc-patches On
> Behalf Of Cui, Lili via Gcc-patches
> Sent: Wednesday, September 21, 2022 5:22 PM
> To: Jan Hubicka
> Cc: Lu, Hong
> Thank you. Can you please also add a testcase that tests for this.
> So you modify imagemagick marking attribute hot on the specific inline?
Thanks Honza. Added the testcase. I didn't modify source code of 538.imagic_r,
the original source code has attribute like:
#define magick_hot_spot __a
Hi Honza,
This patch is to add attribute hot judgement for INLINE_HINT_known_hot hint.
We set up INLINE_HINT_known_hot hint only when we have profile feedback,
now add function attribute judgement for it, when both caller and callee
have __attribute__((hot)), we will also set up INLINE_HINT_known
Hi Honza,
Gentle ping https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597891.html
Thanks,
Lili.
> -Original Message-
> From: Gcc-patches On
> Behalf Of Cui, Lili via Gcc-patches
> Sent: Sunday, July 10, 2022 10:05 PM
> To: Jan Hubicka
> Cc: Lu, Hongjiu ; Li
> -Original Message-
> From: Jan Hubicka
> This is interesting idea. Basically we want to guess if inlining will
> make SRA and or strore->load propagation possible. I think the
> solution using INLINE_HINT may be bit too trigger happy, since it is very
> common that this happens and
From: Lili
Hi Hubicka,
This patch is to add a heuristic inline hint to eliminate redundant load and
store.
Bootstrap and regtest pending on x86_64-unknown-linux-gnu.
OK for trunk?
Thanks,
Lili.
Add a INLINE_HINT_eliminate_load_and_store hint in to inline pass.
We accumulate the insn number
This patch is to change dg-options for two testcases.
Use -mtune=generic to limit these two testcases. Because configuring them with
-mtune=cascadelake or znver3 will vectorize them.
regtested on x86_64-linux-gnu{-m32,}. Ok for trunk?
Thanks,
Lili.
Use -mtune=generic to limit these two test cas
> -Original Message-
> From: Hongtao Liu
> Sent: Monday, June 6, 2022 1:25 PM
> To: H.J. Lu
> Cc: Cui, Lili ; Liu, Hongtao ; GCC
> Patches
> Subject: Re: [PATCH] Update {skylake,icelake,alderlake}_cost to add a bit
> preference to vector store.
> >
> > Should we add some tests to verify
This patch is to update {skylake,icelake,alderlake}_cost to add a bit
preference to vector store.
Since the interger vector construction cost has changed, we need to adjust the
load and store costs for intel processers.
With the patch applied
538.imagic_r:gets ~6% improvement on ADL for multicop
Hi Hongtao,
This patch is to correct march=sapphirerapids to base on icelake server.
and update sapphirerapids in the documentation.
OK for master and backport to GCC 11?
gcc/Changelog:
PR target/104963
* config/i386/i386.h (PTA_SAPPHIRERAPIDS): change it to base on ICX.
Hi Uros,
This patch is to update Intel architectures ISA support in documentation.
Since the ISA supported by Intel architectures in the documentation
are inconsistent with the actual, modify them all.
OK for master?
gcc/Changelog:
* gcc/doc/invoke.texi: Update documents for Intel architectu
Hi Uros,
This patch is to update model value for Alderlake and Rocketlake.
Bootstrap is ok, and no regressions for i386/x86-64 testsuite.
OK for master?
gcc/ChangeLog
* common/config/i386/cpuinfo.h (get_intel_cpu): Add new model values
to Alderlake and Rocketlake.
---
gcc/comm
Hi Uros,
This patch is to update mtune for tremont.
Bootstrap is ok, and no regressions for i386/x86-64 testsuite.
OK for master?
Silvermont has a special handle in add_stmt_cost function, because it has in
order SIMD pipeline. But for Tremont, its SIMD pipeline is out of order,
remove Tremont
Hi Uros,
This patch is to update mtune for alderlake.
Bootstrap is ok, and no regressions for i386/x86-64 testsuite.
OK for master?
Update mtune for alderlake, Alder Lake Intel Hybrid Technology will not support
Intel® AVX-512. ISA features such as Intel® AVX, AVX-VNNI, Intel® AVX2, and
UMONITO
> -Original Message-
> From: Uros Bizjak
> Sent: Thursday, September 16, 2021 2:28 PM
> To: Cui, Lili
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; H. J. Lu
>
> Subject: Re: [PATCH 3/4] [PATCH 3/4] x86: Properly handle
> USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONVERTS
>
> On Wed, Sep 15, 20
> -Original Message-
> From: H.J. Lu
> Sent: Wednesday, September 15, 2021 10:14 PM
> To: Cui, Lili
> Cc: Uros Bizjak ; GCC Patches patc...@gcc.gnu.org>; Liu, Hongtao
> Subject: Re: [PATCH 4/4] [PATCH 4/4] x86: Add
> TARGET_SSE_PARTIAL_REG_[FP_]CONVERTS_DEPENDENCY
>
> There is no nee
Hi Uros,
This patch is to synchronize Rocket Lake's processor_names and
processor_cost_table with processor_type.
Bootstrap is ok, and no regressions for i386/x86-64 testsuite.
OK for master?
[PATCH] Synchronize Rocket Lake's processor_names and
processor_cost_table with processor_type
gcc/
Updated wwwdocs for Rocketlake [GCC11], thanks.
[PATCH] Mention Rocketlake
---
htdocs/gcc-11/changes.html | 4
1 file changed, 4 insertions(+)
diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index a7fa4e1b..38725abc 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs
Hi Uros,
This patch is about to add Rocket Lake to GCC.
Rocket Lake is based on Ice Lake client and minus SGX.
For detailed information, please refer to
https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html
Bootst
Hi Uros,
This patch is about to change Alder Lake ISA list to GCC add m_ALDERLAKE to
m_CORE_AVX2.
Alder Lake Intel Hybrid Technology is based on Tremont and plus
ADCX/AVX/AVX2/BMI/BMI2/F16C/FMA/LZCNT/
PCONFIG/PKU/VAES/VPCLMULQDQ/SERIALIZE/HRESET/KL/WIDEKL/AVX-VNNI
For detailed information, pleas
Hi Uros,
This patch is to correct previous patch,
PREFETCHW should be both in march=broadwell and march=Silvermont,
but I move PREFETCHW from march=broadwell to march=silvermont in previous
patch, sorry for that.
Bootstrap is ok, and no regressions for i386/x86-64 testsuite.
OK for master?
[P
Hi Uros,
This patch is to correct some instruction sets for
march=Tremont/Broadwell/Silvermont/knl
Bootstrap is ok, and no regressions for i386/x86-64 testsuite.
OK for master?
[PATCH] Enable MOVDIRI, MOVDIR64B, CLDEMOTE and WAITPKG for
march=tremont
1. Enable MOVDIRI, MOVDIR64B, CLDEMOTE a
Hi:
This patch is about to add Sapphire Rapids and Alder Lake to GCC.
Sapphire Rapids is based on Cooper Lake and plus ISA
MOVDIRI/MOVDIR64B/AVX512VP2INTERSECT/ENQCMD/CLDEMOTE/PTWRITE/WAITPKG/SERIALIZE/TSXLDTRK.
Alder Lake is based on Skylake and plus ISA CLDEMOTE/PTWRITE/WAITPK/SERIALIZE.
For de
Hi Uros,
This patch is to fix bitmask conflict between PTA_AVX512VP2INTERSECT and
PTA_WAITPKG
in gcc/config/i386/i386.h
Bootstrap is ok, make-check ok for i386 target. Ok for trunk?
gcc/ChangeLog:
* config/i386/i386.h (PTA_WAITPKG): Change bitmask value.
---
gcc/config/i386/i386
57 matches
Mail list logo