Re: Question about the SLP vectorizer failed to perform automatic vectorization in one case

2024-05-27 Thread Hanke Zhang via Gcc
'm trying to studing the automatic vectorization optimization in GCC, > > but I found one case that SLP vectorizer failed to do such things. > > > > Here is the sample code: (also a simplification version of a function > > from the 625/525.x264 source code in SPEC CPU 2017) &g

Re: Question about the SLP vectorizer failed to perform automatic vectorization in one case

2024-05-27 Thread Richard Biener via Gcc
On Sat, May 25, 2024 at 3:08 PM Hanke Zhang via Gcc wrote: > > Hi, > I'm trying to studing the automatic vectorization optimization in GCC, > but I found one case that SLP vectorizer failed to do such things. > > Here is the sample code: (also a simplification version of a

Question about the SLP vectorizer failed to perform automatic vectorization in one case

2024-05-25 Thread Hanke Zhang via Gcc
Hi, I'm trying to studing the automatic vectorization optimization in GCC, but I found one case that SLP vectorizer failed to do such things. Here is the sample code: (also a simplification version of a function from the 625/525.x264 source code in SPEC CPU 2017) void pixel_sub_wxh(int16_t

Question about vectorization optimization during RTL-PASS

2023-11-12 Thread Hanke Zhang via Gcc
Hi, I've been working on vectorization-related optimization lately. GCC seems to have some optimization vulnerabilities. I would like to ask if it can be solved. For example, for the following program using AVX2: #include // reg->node2[i].state is an unsigned long long variable // reg-

Re: the elimination of if blocks in GCC during if-conversion and vectorization

2023-10-26 Thread Richard Biener via Gcc
> > > > > > > > > > I came across a more complex version of that and found that gcc > > > > > doesn't seem to handle it, so wanted to write a pass myself to > > > > > optimize it. > > > > > > > > > > I got two question

Re: the elimination of if blocks in GCC during if-conversion and vectorization

2023-10-23 Thread Hanke Zhang via Gcc
ation, where should I put it? Put it behind the > > > > pass_iv_optimize? > > > > > > GCC has the final value replacement pass (pass_scev_cprop) doing these > > > kind of transforms. Since 'ans' does not have an affine evolution this >

Re: the elimination of if blocks in GCC during if-conversion and vectorization

2023-10-23 Thread Richard Biener via Gcc
s > > case would need to be pattern matched (there are some existing pattern > > matchings in the pass). > > > > > Thanks > > > Hanke Zhang > > > > > > Richard Biener 于2023年10月17日周二 20:00写道: > > > > > > > > On Tue, Oct 17,

Re: the elimination of if blocks in GCC during if-conversion and vectorization

2023-10-23 Thread Hanke Zhang via Gcc
g > > > > Richard Biener 于2023年10月17日周二 20:00写道: > > > > > > On Tue, Oct 17, 2023 at 1:54 PM Hanke Zhang wrote: > > > > > > > > Richard Biener 于2023年10月17日周二 17:26写道: > > > > > > > > > > On Thu, Oct 12, 2

Re: the elimination of if blocks in GCC during if-conversion and vectorization

2023-10-19 Thread Richard Biener via Gcc
10月17日周二 20:00写道: > > > > On Tue, Oct 17, 2023 at 1:54 PM Hanke Zhang wrote: > > > > > > Richard Biener 于2023年10月17日周二 17:26写道: > > > > > > > > On Thu, Oct 12, 2023 at 2:18 PM Hanke Zhang via Gcc > > > > wrote: > > > > > > >

Re: the elimination of if blocks in GCC during if-conversion and vectorization

2023-10-17 Thread Hanke Zhang via Gcc
> > > On Thu, Oct 12, 2023 at 2:18 PM Hanke Zhang via Gcc > > > wrote: > > > > > > > > Hi, I'm recently working on vectorization of GCC. I'm stuck in a small > > > > problem and would like to ask for advice. > > > > >

Re: the elimination of if blocks in GCC during if-conversion and vectorization

2023-10-17 Thread Hanke Zhang via Gcc
Richard Biener 于2023年10月17日周二 17:26写道: > > On Thu, Oct 12, 2023 at 2:18 PM Hanke Zhang via Gcc wrote: > > > > Hi, I'm recently working on vectorization of GCC. I'm stuck in a small > > problem and would like to ask for advice. > > > > For exa

Re: the elimination of if blocks in GCC during if-conversion and vectorization

2023-10-17 Thread Richard Biener via Gcc
On Tue, Oct 17, 2023 at 1:54 PM Hanke Zhang wrote: > > Richard Biener 于2023年10月17日周二 17:26写道: > > > > On Thu, Oct 12, 2023 at 2:18 PM Hanke Zhang via Gcc wrote: > > > > > > Hi, I'm recently working on vectorization of GCC. I'm stuck in a small

Re: the elimination of if blocks in GCC during if-conversion and vectorization

2023-10-17 Thread Richard Biener via Gcc
On Thu, Oct 12, 2023 at 2:18 PM Hanke Zhang via Gcc wrote: > > Hi, I'm recently working on vectorization of GCC. I'm stuck in a small > problem and would like to ask for advice. > > For example, for the following code: > > int main() { > int size = 1000; >

the elimination of if blocks in GCC during if-conversion and vectorization

2023-10-12 Thread Hanke Zhang via Gcc
Hi, I'm recently working on vectorization of GCC. I'm stuck in a small problem and would like to ask for advice. For example, for the following code: int main() { int size = 1000; int *foo = malloc(sizeof(int) * size); int c1 = rand(), t1 = rand(); for (int i = 0; i

Re: How to make parallelizing loops and vectorization work at the same time?

2023-09-17 Thread Richard Biener via Gcc
> > expected to change anything. Instead you can use -fno-tree-vectorize on > > > > the second last one. Doing that I get 111s vs 41s thus doing both > > > > helps. > > > > > > > > Note parallelization hasn't seen any development in th

Re: How to make parallelizing loops and vectorization work at the same time?

2023-09-15 Thread Hanke Zhang via Gcc
he second last one. Doing that I get 111s vs 41s thus doing both helps. > > > > > > Note parallelization hasn't seen any development in the last years. > > > > > > Richard. > > > > Hi Richard: > > > > Thank you for your sincere

Re: How to make parallelizing loops and vectorization work at the same time?

2023-09-15 Thread Richard Biener via Gcc
e following after I add > `-fipo-info-vec`: > > gcc-10 test.c -O3 -flto -mavx2 -fopt-info-vec > > test.c:29:5: optimized: loop vectorized using 32 byte vectors > gcc-10 test.c -O3 -flto -mavx2 -fopt-info-vec -ftree-parallelize-loops=24 > > nothing happened > > That

Re: How to make parallelizing loops and vectorization work at the same time?

2023-09-15 Thread Hanke Zhang via Gcc
incere reply. I get what you mean above. But I still see the following after I add `-fipo-info-vec`: gcc-10 test.c -O3 -flto -mavx2 -fopt-info-vec > test.c:29:5: optimized: loop vectorized using 32 byte vectors gcc-10 test.c -O3 -flto -mavx2 -fopt-info-vec -ftree-parallelize-loops=24 > nothing

Re: How to make parallelizing loops and vectorization work at the same time?

2023-09-15 Thread Richard Biener via Gcc
ime, if I use > the option of `-ftree-parallelize-loops` alone, it will also bring a > big efficiency gain. But if I use both options, vectorization fails, > that is, I can't get the benefits of vectorization, I can only get the > benefits of parallelizing loops. > > I know th

How to make parallelizing loops and vectorization work at the same time?

2023-09-15 Thread Hanke Zhang via Gcc
g efficiency gain compared to doing nothing; At the same time, if I use the option of `-ftree-parallelize-loops` alone, it will also bring a big efficiency gain. But if I use both options, vectorization fails, that is, I can't get the benefits of vectorization, I can only get the benefits of p

Re: Re: Question about dynamic choosing vectorization factor for RVV

2023-08-31 Thread Richard Biener via Gcc
, > V2QI > > There are 11 modes. > Should I increase the number from 8 to 11? It will just perform dynamic allocation, no need to adjust. > Thanks. > > > juzhe.zh...@rivai.ai > > From: Richard Biener > Date: 2023-08-31 19:29 > To: juzhe.zh...@rivai.ai >

Re: Re: Question about dynamic choosing vectorization factor for RVV

2023-08-31 Thread juzhe.zh...@rivai.ai
Biener Date: 2023-08-31 19:29 To: juzhe.zh...@rivai.ai CC: gcc; richard.sandiford Subject: Re: Re: Question about dynamic choosing vectorization factor for RVV On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote: > Hi. Thanks Richard and Richi. > > Now, I figure out how to choose sma

Re: Re: Question about dynamic choosing vectorization factor for RVV

2023-08-31 Thread juzhe.zh...@rivai.ai
Subject: Re: Re: Question about dynamic choosing vectorization factor for RVV On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote: > Hi. Thanks Richard and Richi. > > Now, I figure out how to choose smaller LMUL now. > > void > costs::finish_cost (const vector_costs *scalar_costs) >

Re: Re: Question about dynamic choosing vectorization factor for RVV

2023-08-31 Thread Richard Biener via Gcc
,v4,v8 > sub a2,a2,a5 > vsetvli zero,a5,e32,m4,ta,ma > vse32.v v4,0(a4) > add a0,a0,a3 > add a1,a1,a3 > add a4,a4,a3 > bne a2,zero,.L3 > > Fantastic architecture of GCC Vector Cost model! > > Thanks a lot. > > > juzhe.zh...@rivai.ai > > From: Richard Bi

Re: Re: Question about dynamic choosing vectorization factor for RVV

2023-08-31 Thread juzhe.zh...@rivai.ai
diford Subject: Re: Re: Question about dynamic choosing vectorization factor for RVV On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote: > Thanks Richi. > > I am trying to figure out how to adjust finish_cost to lower the LMUL > > For example: > > void > foo (int32_t *__res

Re: Re: Question about dynamic choosing vectorization factor for RVV

2023-08-31 Thread Richard Biener via Gcc
v v4,0(a4) > add a0,a0,a3 > add a1,a1,a3 > add a4,a4,a3 > bne a2,zero,.L3 > .L5: > ret > > I am experimenting whether we can adjust cost statically to make loop > vectorizer use LMUL = 4 even though preferred_simd_mode return LMUL = 8. > If we can do that, I think we can ap

Re: Question about dynamic choosing vectorization factor for RVV

2023-08-31 Thread Richard Sandiford via Gcc
"juzhe.zh...@rivai.ai" writes: > Thanks Richi. > > I am trying to figure out how to adjust finish_cost to lower the LMUL > > For example: > > void > foo (int32_t *__restrict a, int32_t *__restrict b, int n) > { > for (int i = 0; i < n; i++) > a[i] = a[i] + b[i]; > } > > preferred_simd_mode p

Re: Re: Question about dynamic choosing vectorization factor for RVV

2023-08-31 Thread juzhe.zh...@rivai.ai
oosing vectorization factor for RVV On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote: > Hi, Richard and Richi. > > Currently, we are statically returning vectorization factor in > 'TARGET_VECTORIZE_PREFERRED_SIMD_MODE' > according to compile option. > > For example

Re: Question about dynamic choosing vectorization factor for RVV

2023-08-31 Thread Richard Biener via Gcc
On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote: > Hi, Richard and Richi. > > Currently, we are statically returning vectorization factor in > 'TARGET_VECTORIZE_PREFERRED_SIMD_MODE' > according to compile option. > > For example: > void > foo (int32_t *__r

Question about dynamic choosing vectorization factor for RVV

2023-08-30 Thread juzhe.zh...@rivai.ai
Hi, Richard and Richi. Currently, we are statically returning vectorization factor in 'TARGET_VECTORIZE_PREFERRED_SIMD_MODE' according to compile option. For example: void foo (int32_t *__restrict a, int32_t *__restrict b, int n) { for (int i = 0; i < n; i++) a[i] = a[i] +

Re: Libgcc divide vectorization question

2023-03-23 Thread Richard Biener via Gcc
On Wed, Mar 22, 2023 at 4:57 PM Andrew Stubbs wrote: > > On 22/03/2023 13:56, Richard Biener wrote: > >> Basically, the -ffast-math instructions will always be the fastest way, > >> but the goal is that the default optimization shouldn't just disable > >> ve

Re: Libgcc divide vectorization question

2023-03-22 Thread Andrew Stubbs
On 22/03/2023 13:56, Richard Biener wrote: Basically, the -ffast-math instructions will always be the fastest way, but the goal is that the default optimization shouldn't just disable vectorization entirely for any loop that has a divide in it. We try to express division as multiplication

Re: Libgcc divide vectorization question

2023-03-22 Thread Richard Biener via Gcc
ly an option, but I think there's > quite a lot of code in those routines. I know how to do that option at > least (except, maybe not the errno handling without making assumptions > about the C runtime). > > Basically, the -ffast-math instructions will always be the fastest way

Re: Libgcc divide vectorization question

2023-03-22 Thread Andrew Stubbs
goal is that the default optimization shouldn't just disable vectorization entirely for any loop that has a divide in it. Andrew

Re: Libgcc divide vectorization question

2023-03-22 Thread Richard Biener via Gcc
On Tue, Mar 21, 2023 at 6:00 PM Andrew Stubbs wrote: > > Hi all, > > I want to be able to vectorize divide operators (softfp and integer), > but amdgcn only has hardware instructions suitable for -ffast-math. > > We have recently implemented vector versions of all the libm functions, > but the lib

Libgcc divide vectorization question

2023-03-21 Thread Andrew Stubbs
Hi all, I want to be able to vectorize divide operators (softfp and integer), but amdgcn only has hardware instructions suitable for -ffast-math. We have recently implemented vector versions of all the libm functions, but the libgcc functions aren't builtins and therefore don't use those hoo

Re: Why vectorization didn't turn on by -O2

2021-08-05 Thread Hongtao Liu via Gcc
ites: > > >> > Alternatively only enable loop vectorization at -O2 (the above checks > > >> > flag_tree_slp_vectorize as well). At least the cost model kind > > >> > does not have any influence on BB vectorization, that is, we get the > > >> &g

Re: Hongtao Liu as x86 vectorization maintainer

2021-06-22 Thread Hongtao Liu via Gcc
Hongtao > > >Cc: gcc Mailing List ; Marek Polacek > > >Subject: Hongtao Liu as x86 vectorization maintainer > > > > > >I am pleased to announce that the GCC Steering Committee has appointed > > >Hongtao Liu as maintainer of the i386 vector extensions in GCC.

Re: Hongtao Liu as x86 vectorization maintainer

2021-06-22 Thread Jakub Jelinek via Gcc
On Mon, Jun 21, 2021 at 02:49:56AM +, Liu, Hongtao via Gcc wrote: > >-Original Message- > >From: Jason Merrill > >Sent: Monday, June 21, 2021 10:07 AM > >To: Liu, Hongtao > >Cc: gcc Mailing List ; Marek Polacek > >Subject: Hongtao Liu as x86

RE: Hongtao Liu as x86 vectorization maintainer

2021-06-20 Thread Liu, Hongtao via Gcc
>-Original Message- >From: Jason Merrill >Sent: Monday, June 21, 2021 10:07 AM >To: Liu, Hongtao >Cc: gcc Mailing List ; Marek Polacek >Subject: Hongtao Liu as x86 vectorization maintainer > >I am pleased to announce that the GCC Steering Committee has a

Hongtao Liu as x86 vectorization maintainer

2021-06-20 Thread Jason Merrill via Gcc
I am pleased to announce that the GCC Steering Committee has appointed Hongtao Liu as maintainer of the i386 vector extensions in GCC. Hongtao, please update your listing in the MAINTAINERS file. Cheers, Jason

Re: Vectorization of loop which operate on local arrays

2020-04-14 Thread Richard Biener via Gcc
On Tue, Apr 14, 2020 at 4:39 PM Shubham Narlawar via Gcc wrote: > > Hello, > > I am working on gcc-4.9.4 and encountered different results of loop > vectorization on array arr0, arr1 and arr2. > > Testcase - > > int main() > { > int i; > for (i=0; i

Vectorization of loop which operate on local arrays

2020-04-14 Thread Shubham Narlawar via Gcc
Hello, I am working on gcc-4.9.4 and encountered different results of loop vectorization on array arr0, arr1 and arr2. Testcase - int main() { int i; for (i=0; i<64; i++) { arr2[i]=(arr1[i]|arr0[i]); } } Using -O2 -ftree-vectorize, Above loop is vectorized

Re: Vectorization Messages

2020-03-24 Thread Richard Biener via Gcc
On March 24, 2020 5:45:05 PM GMT+01:00, Roger Martz via Gcc wrote: >I was glad to see that compiler flags such as -fopt-info-vec-missed ... >provide information about what is happening under the hood w.r.t code >that >can and can't be vectorized. > >Can anyone point me to a document, etc. that wo

Vectorization Messages

2020-03-24 Thread Roger Martz via Gcc
I was glad to see that compiler flags such as -fopt-info-vec-missed ... provide information about what is happening under the hood w.r.t code that can and can't be vectorized. Can anyone point me to a document, etc. that would be helpful in understanding what the messages output from the compiler

Re: SLP-based reduction vectorization

2019-01-24 Thread Richard Biener
; 2. The current version considers only PLUS reduction > >> as it is encountered most often and therefore is the > >> most practical; > >> > >> 3. While normally SLP transformation should operate > >> inside single basic block this requirement greatly > &g

Re: SLP-based reduction vectorization

2019-01-24 Thread Anton Youdkevitch
sake the current version does not deal with partial reductions which would require partial sum merging and careful removal of the scalars that participate in the vector part. The latter gets done automatically by DCE in the case of full reduction vectorization; 5. There is no cost model yet for the

Re: SLP-based reduction vectorization

2019-01-24 Thread Richard Biener
ll be vectorizable subexpressions > defined in basic block(s) different from that where the > reduction result resides. However, for the sake of > simplicity only single uses in the same block are > considered now; > > 4. For the same sake the current version does not deal >

SLP-based reduction vectorization

2019-01-21 Thread Anton Youdkevitch
se of full reduction vectorization; 5. There is no cost model yet for the reasons mentioned in the paragraphs 3 and 4. Thanks in advance. -- Anton >From eb2644765d68ef1c629e584086355a8d66df7c73 Mon Sep 17 00:00:00 2001 From: Anton Youdkevitch Date: Fri, 9 Nov 2018 20:50:05 +0300 Subject: [PATCH

Re: Enabling vectorization at -O2 for x86 generic, core and zen tuning

2019-01-07 Thread Jan Hubicka
> On Mon, Jan 07, 2019 at 09:29:09AM +0100, Richard Biener wrote: > > On Sun, 6 Jan 2019, Jan Hubicka wrote: > > > Even though it is late in release cycle I wonder if we can do that for > > > GCC 9? Performance of vectorization is very architecture specific, I &

Re: Enabling vectorization at -O2 for x86 generic, core and zen tuning

2019-01-07 Thread Segher Boessenkool
On Mon, Jan 07, 2019 at 09:29:09AM +0100, Richard Biener wrote: > On Sun, 6 Jan 2019, Jan Hubicka wrote: > > Even though it is late in release cycle I wonder if we can do that for > > GCC 9? Performance of vectorization is very architecture specific, I > > would propose enabl

Re: Enabling vectorization at -O2 for x86 generic, core and zen tuning

2019-01-07 Thread Jan Hubicka
hmark difference between > > cost models. > > ; Alias to enable both -ftree-loop-vectorize and -ftree-slp-vectorize. > ftree-vectorize > Common Report Optimization > Enable vectorization on trees. Thanks! I would probably fall into that trap and run same set of benchmarks again. Honza > > -- > Eric Botcazou

Re: Enabling vectorization at -O2 for x86 generic, core and zen tuning

2019-01-07 Thread Eric Botcazou
gt; cost models. ; Alias to enable both -ftree-loop-vectorize and -ftree-slp-vectorize. ftree-vectorize Common Report Optimization Enable vectorization on trees. -- Eric Botcazou

Re: Enabling vectorization at -O2 for x86 generic, core and zen tuning

2019-01-07 Thread Richard Biener
t2k17 zen generic +3.61% > SPECint2k17 zen native +5.18% > > The performance results seems surprisingly a lot in favor of > vectorization. Martin's setup is also checking code size which goes up > by as much 26% on leslie 3d, but since many of benchmarks ar

Enabling vectorization at -O2 for x86 generic, core and zen tuning

2019-01-06 Thread Jan Hubicka
p2k6 zen generic +9.98% SPECfp2k6 zen native +7.04% SPECfp2k17 zen generic +6.11% SPECfp2k17 zen native +5.46% SPECint2k17 zen generic +3.61% SPECint2k17 zen native +5.18% The performance results seems surprisingly a lot in favor of vectorization. Martin

Re: "match.pd" (was: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?)

2018-11-04 Thread Marc Glisse
(resent because of mail issues on my end) On Mon, 22 Oct 2018, Thomas Schwinge wrote: I had a quick look at the difference, and a[j][i] remains in this form throughout optimization. If I write instead *((*(a+j))+i) = 0; I get j_10 = tmp_17 / 1025; i_11 = tmp_17 % 1025; _1 = (long unsi

Re: "match.pd" (was: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?)

2018-10-23 Thread Richard Biener
On Mon, Oct 22, 2018 at 6:35 PM Thomas Schwinge wrote: > > Hi! > > Thanks for all your comments already! I continued looked into this for a > bit (but then got interrupted by a higher-priority task). Regarding this > one specifically: > > On Fri, 12 Oct 2018 21:14:11 +0200, Marc Glisse wrote: >

"match.pd" (was: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?)

2018-10-22 Thread Thomas Schwinge
Hi! Thanks for all your comments already! I continued looked into this for a bit (but then got interrupted by a higher-priority task). Regarding this one specifically: On Fri, 12 Oct 2018 21:14:11 +0200, Marc Glisse wrote: > On Fri, 12 Oct 2018, Thomas Schwinge wrote: > > > Hmm, and without a

Re: [RFC][mid-end] Support vectorization of complex numbers using machine instructions.

2018-10-19 Thread Richard Sandiford
Tamar Christina writes: >> > so I'd need 5 parameters and then I'm guessing the other expressions >> would be removed by DCE at some point? >> >> Are you planning to make the FCMLA behaviour directly available as an >> internal function or provide a higher-level one that does a full complex >> mu

RE: [RFC][mid-end] Support vectorization of complex numbers using machine instructions.

2018-10-18 Thread Tamar Christina
Hi Richard, Thanks for all the help so far, > > so I'd need 5 parameters and then I'm guessing the other expressions > would be removed by DCE at some point? > > Are you planning to make the FCMLA behaviour directly available as an > internal function or provide a higher-level one that does a fu

Re: [RFC][mid-end] Support vectorization of complex numbers using machine instructions.

2018-10-17 Thread Richard Sandiford
Tamar Christina writes: > Hi Richard, >> > [...] >> > 3) So I abandoned vec-patterns and instead tried to do it in >> > tree-vect-slp.c in vect_analyze_slp_instance just after the SLP tree >> > is created. Matching the SLP tree is quite simple and getting it to >> > emit the right SLP tree was si

RE: [RFC][mid-end] Support vectorization of complex numbers using machine instructions.

2018-10-17 Thread Tamar Christina
Hi Richard, > > [...] > > 3) So I abandoned vec-patterns and instead tried to do it in > > tree-vect-slp.c in vect_analyze_slp_instance just after the SLP tree > > is created. Matching the SLP tree is quite simple and getting it to > > emit the right SLP tree was simple enough,except that at this

Re: [RFC][mid-end] Support vectorization of complex numbers using machine instructions.

2018-10-16 Thread Richard Sandiford
Tamar Christina writes: > Hi All, > > I am trying to add support to the auto-vectorizer for complex operations where > a target has instructions for. > > The instructions I have are only available as vector instructions. The > operations > are complex addition with a rotation or complex fmla wit

[RFC][GCC][mid-end] Support vectorization of complex numbers using machine instructions.

2018-10-16 Thread Tamar Christina
Hi All, I am trying to add support to the auto-vectorizer for complex operations where a target has instructions for. The instructions I have are only available as vector instructions. The operations are complex addition with a rotation or complex fmla with a rotation for half floats, floats an

Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?

2018-10-15 Thread Sebastian Pop
t;_3 = (sizetype) i_11; >_4 = _2 + _3; > > and in both cases we fail to notice that _4 = (sizetype) tmp_17; (at least > I think that's true). > > If this folding is correct, the dependence analysis would not have to handle array accesses with div and mod, and it would b

Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?

2018-10-15 Thread Richard Biener
< m1; j++) { a[i][j] = omp_get_thread_num (); } if (m_tail1) for (int j = 0; j < m_tail1; j++) ... with appropriate start/end for the i/j loop and the "epilogue" loop? > > That is, can we delay the actual collapsing until after vectorization > &

Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?

2018-10-15 Thread Jakub Jelinek
teration space might be diagonal or other not exactly rectangular. > That is, can we delay the actual collapsing until after vectorization > for example? No. We can come up with some way to propagate some of the original info to the vectorizer if it helps (or teach vectorizer to recognize whate

Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?

2018-10-15 Thread Richard Biener
> it more, etc. > If we come up with some way to help the vectorizer with the collapsed loop, > whether in a form of some loop flags, or internal fns, whatever, I'm all for > it. But isn't _actual_ collapsing an implementation detail? That is, isn't it enough to interpret clauses in terms of the collapse result? That is, can we delay the actual collapsing until after vectorization for example? Richard. > > Jakub

Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?

2018-10-15 Thread Jakub Jelinek
On Mon, Oct 15, 2018 at 10:55:26AM +0200, Richard Biener wrote: > Yeah. Note this still makes the IVs not analyzable since i now effectively > becomes wrapping in the inner loop. For some special values we might > get away with a wrapping CHREC in a bit-precision type but we cannot > represent wr

Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?

2018-10-15 Thread Richard Biener
er loop: >: > i = i.0; > j = j.1; > _1 = a[i][j]; > _2 = _1 + 1; > a[i][j] = _2; > .iter.4 = .iter.4 + 1; > j.1 = j.1 + 1; > D.2912 = j.1 < n.7 ? 0 : 1; > i.0 = D.2912 + i.0; > j.1 = j.1 < n.7 ? j.1 : 0; > >: > if (.iter.

Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?

2018-10-12 Thread Jakub Jelinek
i.0 = D.2912 + i.0; j.1 = j.1 < n.7 ? j.1 : 0; : if (.iter.4 < D.2902) goto ; [87.50%] else goto ; [12.50%] to make it more vectorization friendly (though, in this particular case it isn't vectorized either) and not do the expensive % and / operations inside

Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?

2018-10-12 Thread Marc Glisse
On Fri, 12 Oct 2018, Thomas Schwinge wrote: Hmm, and without any OpenACC/OpenMP etc., actually the same problem is also present when running the following code through the vectorizer: for (int tmp = 0; tmp < N_J * N_I; ++tmp) { int j = tmp / N_I; int i = tmp % N_I;

Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?

2018-10-12 Thread Thomas Schwinge
Hi! I'm for the first time looking into the existing vectorization functionality in GCC (yay!), and with that I'm also for the first time encountering GCC's scalar evolution (scev) machinery (yay!), and the chains of recurrences (chrec) used by that (yay!). Obviously, I'm ri

Re: LP64, unsigned int, vectorization, and PR 61247

2018-10-04 Thread Andrew Pinski
On Thu, Oct 4, 2018 at 1:49 PM Steve Ellcey wrote: > > I was looking at PR tree-optimization/61247, where a loop with an unsigned > int index on an LP64 platform was not getting vectorized and I noticed an > odd thing. In the function below, if I define N as 1000 or 1, the > loop does get vec

LP64, unsigned int, vectorization, and PR 61247

2018-10-04 Thread Steve Ellcey
I was looking at PR tree-optimization/61247, where a loop with an unsigned int index on an LP64 platform was not getting vectorized and I noticed an odd thing.  In the function below, if I define N as 1000 or 1, the loop does get vectorized, even in LP64 mode.  But if I define N as 10, the

An issue on loop optimization/vectorization

2018-07-11 Thread jiangning liu
+ s2[x]; d += 16; } } If we change “for( int x = 0; x < 16; x++ )” to be like “for( int x = 0; x < 32; x++ )”, very beautiful vectorization code would be generated, test_loop: .LFB0: .cfi_startproc adrpx2, g_s1 adrpx3, g_s2 ad

Re: Interesting statistics on vectorization for Skylake avx512 (i9-7900) - 8.1 vs. 7.3.

2018-05-04 Thread Richard Biener
On Thu, May 3, 2018 at 8:43 PM, Toon Moene wrote: > Consider the attached Fortran code (the most expensive routine, > computation-wise, in our weather forecasting model). > > verint.s.7.3 is the result of: > > gfortran -g -O3 -S -march=native -mtune=native verint.f > > using release 7.3. > > verin

Interesting statistics on vectorization for Skylake avx512 (i9-7900) - 8.1 vs. 7.3.

2018-05-03 Thread Toon Moene
Consider the attached Fortran code (the most expensive routine, computation-wise, in our weather forecasting model). verint.s.7.3 is the result of: gfortran -g -O3 -S -march=native -mtune=native verint.f using release 7.3. verint.s.8.1 is the result of: gfortran -g -O3 -S -march=native -mtun

Re: Vectorization / libmvec / libgomp question

2018-02-23 Thread Jakub Jelinek
On Fri, Feb 23, 2018 at 11:44:40AM -0800, Steve Ellcey wrote: > I have a question about loop vectorization, OpenMP, and libmvec.  I am > experimenting with this on Aarch64 and looking at what exists on x86 > and trying to understand the relationship (if there is one) between the > ve

Vectorization / libmvec / libgomp question

2018-02-23 Thread Steve Ellcey
I have a question about loop vectorization, OpenMP, and libmvec.  I am experimenting with this on Aarch64 and looking at what exists on x86 and trying to understand the relationship (if there is one) between the vector library (libmvec) and OpenMP (libgomp). On x86, an OpenMP loop with a sin

Re: How to force gcc to vectorize the loop with particular vectorization width

2017-10-21 Thread Richard Biener
On October 21, 2017 9:50:13 PM GMT+02:00, Denis Bakhvalov wrote: >Hello Richard, >Thank you. I achieved vectorization with vf = 16, using >#pragma GCC optimize ("no-unroll-loops") >__attribute__ ((__target__ ("sse4.2"))) >and options -march=core-avx2 -mprefer-a

Re: How to force gcc to vectorize the loop with particular vectorization width

2017-10-21 Thread Denis Bakhvalov
Hello Richard, Thank you. I achieved vectorization with vf = 16, using #pragma GCC optimize ("no-unroll-loops") __attribute__ ((__target__ ("sse4.2"))) and options -march=core-avx2 -mprefer-avx-128 But now I have a question: Is it possible in gcc to have vectorization with vf

Re: How to force gcc to vectorize the loop with particular vectorization width

2017-10-20 Thread Richard Biener
roll >>> > this vectorized loop by some defined factor." >>> > >>> > I was playing with #pragma omp simd with the safelen clause, and >>> > #pragma GCC optimize("unroll-loops") with no success. Compiler option >>> > -fmax-unroll-t

Re: How to force gcc to vectorize the loop with particular vectorization width

2017-10-20 Thread Denis Bakhvalov
laying with #pragma omp simd with the safelen clause, and >> > #pragma GCC optimize("unroll-loops") with no success. Compiler option >> > -fmax-unroll-times is not suitable for me, because it will affect >> > other parts of the code. >> > >> > I

Re: How to force gcc to vectorize the loop with particular vectorization width

2017-10-19 Thread Jakub Jelinek
not suitable for me, because it will affect > > other parts of the code. > > > > Is it possible to achieve this somehow? > > No. #pragma omp simd has simdlen clause which is a hint on the preferable vectorization factor, but the vectorizer doesn't use it so far; pro

Re: How to force gcc to vectorize the loop with particular vectorization width

2017-10-19 Thread Richard Biener
On Thu, Oct 19, 2017 at 9:22 AM, Denis Bakhvalov wrote: > Hello! > > I have a hot inner loop which was vectorized by gcc, but I also want > compiler to unroll this loop by some factor. > It can be controled in clang with this pragma: > #pragma clang loop vectorize(enable) vectorize_width(8) > Plea

How to force gcc to vectorize the loop with particular vectorization width

2017-10-19 Thread Denis Bakhvalov
Hello! I have a hot inner loop which was vectorized by gcc, but I also want compiler to unroll this loop by some factor. It can be controled in clang with this pragma: #pragma clang loop vectorize(enable) vectorize_width(8) Please see example here: https://godbolt.org/g/UJoUJn So I want to tell g

Re: SPEC 456.hmmer vectorization question

2017-03-09 Thread Richard Biener
On Thu, Mar 9, 2017 at 9:12 AM, Jakub Jelinek wrote: > On Thu, Mar 09, 2017 at 09:02:38AM +0100, Richard Biener wrote: >> It would need to be done before graphite, and yes, the question is when >> to do this (given the non-trival text size and runtime cost). One option is >> to do sth similar lik

Re: SPEC 456.hmmer vectorization question

2017-03-09 Thread Jakub Jelinek
On Thu, Mar 09, 2017 at 09:02:38AM +0100, Richard Biener wrote: > It would need to be done before graphite, and yes, the question is when > to do this (given the non-trival text size and runtime cost). One option is > to do sth similar like we do with IFN_LOOP_VECTORIZED, that is, after > followup

Re: SPEC 456.hmmer vectorization question

2017-03-09 Thread Richard Biener
On Wed, Mar 8, 2017 at 8:41 PM, Steve Ellcey wrote: > On Tue, 2017-03-07 at 14:45 +0100, Michael Matz wrote: >> Hi Steve, >> >> On Mon, 6 Mar 2017, Steve Ellcey wrote: >> >> > >> > I was looking at the spec 456.hmmer benchmark and this email string >> > from Jeff Law and Micheal Matz: >> > >> >

Re: SPEC 456.hmmer vectorization question

2017-03-08 Thread Steve Ellcey
On Tue, 2017-03-07 at 14:45 +0100, Michael Matz wrote: > Hi Steve, > > On Mon, 6 Mar 2017, Steve Ellcey wrote: > > > > > I was looking at the spec 456.hmmer benchmark and this email string > > from Jeff Law and Micheal Matz: > > > >   https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01970.html > >

Re: SPEC 456.hmmer vectorization question

2017-03-07 Thread Michael Matz
Hi Steve, On Mon, 6 Mar 2017, Steve Ellcey wrote: > I was looking at the spec 456.hmmer benchmark and this email string > from Jeff Law and Micheal Matz: > > https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01970.html > > and was wondering if anyone was looking at what more it would take > for G

SPEC 456.hmmer vectorization question

2017-03-06 Thread Steve Ellcey
I was looking at the spec 456.hmmer benchmark and this email string from Jeff Law and Micheal Matz: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01970.html and was wondering if anyone was looking at what more it would take for GCC to vectorize the loop in P7Viterbi. There is a big performanc

Re: Vectorization regression on s390x GCC6 vs GCC5

2017-01-26 Thread Richard Biener
s? > I am doing some experiments calculating coarse-grained register > pressure for GIMPLE loop, but the motivation is not from vectorizer, > but predcom/pre, like PR77498. > >> Perhaps something we could/should fix in the s390 backend? (Probably >> hard to tell without source)

Re: Vectorization regression on s390x GCC6 vs GCC5

2017-01-26 Thread Bin.Cheng
R77498. > Perhaps something we could/should fix in the s390 backend? (Probably > hard to tell without source) > > - Would it make sense to allow a backend to specify the minimal number > of loop iterations considered for vectorization? Is this > perhaps already possible somehow? I

Vectorization regression on s390x GCC6 vs GCC5

2017-01-26 Thread Robin Dapp
a backend to specify the minimal number of loop iterations considered for vectorization? Is this perhaps already possible somehow? I added a check to disable vectorization for loops with <= 3 iterations that shows no regressions and improves two SPEC benchmarks noticeably. I'm even con

Re: vectorization ICE for aarch64/armhf on SPEC2006 h264ref

2016-01-13 Thread Andrew Pinski
On Tue, Jan 12, 2016 at 11:05 PM, Jim Wilson wrote: > On Tue, Jan 12, 2016 at 2:22 PM, Jim Wilson wrote: >> I see a number of places in tree-vect-generic.c that add a >> VIEW_CONVERT_EXPR if useless_type_convertsion_p is false. That should >> work, except when I try this, I see that the VIEW_CON

Re: vectorization ICE for aarch64/armhf on SPEC2006 h264ref

2016-01-12 Thread Jim Wilson
On Tue, Jan 12, 2016 at 2:22 PM, Jim Wilson wrote: > I see a number of places in tree-vect-generic.c that add a > VIEW_CONVERT_EXPR if useless_type_convertsion_p is false. That should > work, except when I try this, I see that the VIEW_CONVERT_EXPR gets > converted to a NOP_EXPR by gimplify_build

vectorization ICE for aarch64/armhf on SPEC2006 h264ref

2016-01-12 Thread Jim Wilson
I'm looking at an ICE on SPEC 2006 464.h264ref slice.c that occurs with -O3 for both aarch64 and armhf. palantir:2080$ ./xgcc -B./ -O3 -S slice.i slice.c: In function ‘poc_ref_pic_reorder’: slice.c:838:6: error: incorrect type of vector CONSTRUCTOR elements {_48, _55, _189, _59} vect_no_reorder_

Re: Prototype implementation: Improving effectiveness and generality of auto-vectorization

2016-01-11 Thread Richard Biener
On Fri, Jan 8, 2016 at 5:11 PM, Alan Lawrence wrote: > On Tues, Oct 27, 2015 at 2:39 PM, Richard Biener > wrote: >> >> On Mon, Oct 26, 2015 at 6:59 AM, sameera >> wrote: >>> >>> >>> Richard, we have defined the input language for convenience in prototype >>> implementation. However, we will be u

  1   2   3   4   >